Detect all kinds of vegetables and fruits values within a column - google-sheets

I am trying to figure out i if there's a more efficient way to detect all the vegetables and fruits that exists within data contained in a certain column in this Google Sheets test Test 1. I have tried some "manual" ways with queries like adding a certain number of identified vegs & fruits like this:
=arrayformula(regexreplace(first cell in first row ;"(?i)car|mother|boxes|houses|person"; "") )
It seems to work fine, but I think it might be a more efficient way since there are many other fruits and vegs in the text and some of them even in other language, moreover I want to delete those commas left in the clean column. Any ideas?

Can try below formula-
=TEXTJOIN(", ";1;INDEX(QUERY(FLATTEN(TRIM(SPLIT(A3;",")));
"select Col1 where not Col1 matches 'machines|cold treatment|red car|houses|person|airplanes|ferrari|refrigerator'";0)))

Related

Only apply complex arrayformula() to rows with certain value in dataset

I have a quite complext formula (i mean that is complex to me) that Tom Sharpe helped me building to aggregate values and ordering them by months in a row(you can find the details in the original post but i think you'll only need the final formula which is:
=ArrayFormula(mmult(sequence(1,counta(A2:A),1,0), if((C2:index(C:C,counta(C:C))<=eomonth(G2,sequence(1,datedif(G2,H2,"M")+1,0)))* (D2:index(D:D,counta(D:D))>=eomonth(G2,sequence(1,datedif(G2,H2,"M")+1,0))),E2:index(E:E,counta(E:E)),0)))
and here is the result -> [J1:U1]
Now, what i would need to do as the final step is to be able to group data by a certain label (John or Jane in the example) on separate rows, but mantaining the order/aggregate by month on the row. On the example, this would mean having one row with only 'John' data and below, one with 'Jane' values.
I am struggling to understand how to adapt the formula to do so.
I have tried:
Using another array to first return a list of these labels with query(unique()) or something like that, but then i struggle looping in it with the other formula.
A bit more simplistic but it could work after all: on the 1st row (the cell next to where the data will be returned) writing 'John', on row 2 'Jane' and then using filter() to only pull data that matches. The 'John, Jane' value is for the example but the real labels won't be that many, the list of labels don't need to be dynamic.
The thing with these solutions is that they work when used separately, but i can't figure out how to nest this in the first arrayformula() that Tom helped me with...As i am just beginning with the google sheets queries.
I don't really need necessarily the complete formula/code but maybe just directions or tips to visualize the way i could solve this.
Thanks to all who might contribute
With hindsight I might have done better to go down the route of using a query to calculate the sums on my previous answer rather than Mmult.
This uses the same method as before to create a 2d array of amounts vs dates (going across) and individuals (going down). Then it uses Textjoin to generate a query to group by name with the required number of columns.
=ArrayFormula(query({A2:A,if((C2:C<=eomonth(G2,sequence(1,datedif(G2,H2,"M")+1,0)))* (D2:D>=eomonth(G2,sequence(1,datedif(G2,H2,"M")+1,0))),E2:E,0)},
"select Col1,sum(Col"&textjoin("),sum(Col",,sequence(1,datedif(G2,H2,"M")+1,2))&") where Col1 is not null group by Col1"))
This is the generated query
select Col1,sum(Col2),sum(Col3),sum(Col4),sum(Col5),sum(Col6),sum(Col7),sum(Col8),sum(Col9),sum(Col10),sum(Col11),sum(Col12),sum(Col13) where Col1 is not null group by Col1
Ideally there should be an extra section saying label sum(Col2) '' etc. to suppress the 'Sum' headers.
=ArrayFormula(query({A2:A,if((C2:C<=eomonth(G2,sequence(1,datedif(G2,H2,"M")+1,0)))* (D2:D>=eomonth(G2,sequence(1,datedif(G2,H2,"M")+1,0))),E2:E,0)},
"select Col1,sum(Col"&textjoin("),sum(Col",,sequence(1,datedif(G2,H2,"M")+1,2))&") where Col1 is not null group by Col1 label sum(Col" & textjoin(") '', sum(Col",,sequence(1,datedif(G2,H2,"M")+1,2)) & ") ''"))

FLATTEN skipping blank cells without using UNIQUE

I'm trying to turn an array into a single column without blank cells, considering that the input will always have some blank cells and that there might be repeated values. I'm trying to use FLATTEN but it keeps the blanks and UNIQUE would kill the repeated values, so I can't use.
I also thought about using something like FLATTEN(QUERY(X:X, "select * WHERE col1,col2,col3,col4,col5 IS NOT NULL") but number of columns might be dynamic so I can't say precisely which columns to use.
My input:
Desired output:
Sample sheet here
Any clue?
Use QUERY on the outside:
=QUERY(FLATTEN(YOUR-RANGE-HERE),"Select * WHERE Col1 Is Not Null")
I left an example in your spreadsheet, cell G2.
=flatten(filter(A1:E10;not(isblank(A1:E10))))
or
=flatten(filter(A1:E10;len(A1:E10)))
or
=filter(flatten(A1:E10);len(flatten(A1:E10)))

Extract values from a range with 2 columns only if the value in column 1 contains a specific word in the column 2

I need to extract each individual person from a list that doesn't contain a certain activity (Project). Sounds easy but I can't quite get to the end of it.
Please check the example here on Sheet 2:
https://docs.google.com/spreadsheets/d/1qjbjXFCYj1qXrVVGNnhOj11asxT_o1xHWXerRqAl1UQ/edit#gid=2105763617
Here's the logic.
First I attempted to see if the individual only occurs once and if the Activity is not "Project"
=IF(A2<>"",IF(and(COUNTIF(A:A,A2)=1,B2<>"Project"),0,1),"")
Then I just extract the name that satisfies this criteria:
=query(ARRAYFORMULA(iF(I2:I=0,A2:A,"")), "where Col1 <>'' ")
This works, except there might be multiple assignments for the same person that does not contain the activity "Project" which my formula doesn't account for nor is it a simple dynamic arrayformula.
=UNIQUE(FILTER(A2:A, B2:B<>"Project"))
=UNIQUE(QUERY(A2:B, "select A where B <>'Project'", 0))
=UNIQUE(FILTER(A2:A, B2:B<>"Project",
NOT(REGEXMATCH(A2:A, "^"&TEXTJOIN("$|^", 1, FILTER(A:A, B:B="Project"))&"$"))))
While #player0's answer solves the question, it took a big performance hit on a sheet with >1000 rows.
Instead, I extracted all names that contained "Project" and then all names that did not contain "Project", then subtracted all the names from the first array to eliminate names that were in both.
=UNIQUE(FILTER(UNIQUE(FILTER(A2:A, B2:B<>"Project")), ISNA(MATCH(UNIQUE(FILTER(A2:A, B2:B<>"Project")), UNIQUE(FILTER(A2:A, B2:B="Project")),0))))
You may try this also:
{=IFERROR(INDEX($A$2:A$25,MATCH(0,IF($C$1<>$B$2:$B$25,COUNTIF($F$1:$F1,$A$2:$A$25), ""), 0)),"")}
N.B.
Cell C1 has criteria Project, using cell reference makes the formula
dynamic rather than hard coded.
Enter this formula in cell F2, finish with Ctrl+Shift+Enter,
and fill down.

How to concatenate strings and select the same columns multiple times using Query (Google Sheets)

I am trying to generate a table for the Gantt chart. Table should have this format:
https://developers.google.com/chart/interactive/docs/gallery/ganttchart#data-format
So,I need task name the same like taks ID, but in Query I can't use Col1 twice (I get error)
=QUERY({Tab1;Tab1};"select Col1,Col1,Col5,Col16,Col17 WHERE Col16>now() ORDER BY Col5 DESC,Col17 ";0)
The second point is that it is also not possible to merge two columns as a result, so it doesn't work:
=QUERY({Tab1;Tab1};"select Col1+Col7,Col1,Col5,Col16,Col17 WHERE Col16>now() ORDER BY Col5 DESC,Col17 ";0)
Here is my data and 2 results what I neet to get by QUERY
https://docs.google.com/spreadsheets/d/1CZYgfYo6oIeONZOH6ZR5rOW615HuH4ICaoe7lj0dapw/edit#gid=0
These are such trivial things in a real SQL, is there no way to do it somehow straightforwardly in Google Query? So far I have found a combination of QUERY and ARRAYFORMULA but then there are very complicated queries - mutants. Not easier?
You don't need Query, just Arrays.
You will get the first result from this code:
={ARRAYFORMULA(B3:B&" "&C3:C)\A3:A}
The second result from this code:
={A3:A\A3:A\B3:B1}
Based on your example I assume that you are not using US spreadsheet settings.
If so formulas have to be change to:
First:
={ARRAYFORMULA(B3:B&" "&C3:C),A3:A}
Second:
={A3:A,A3:A,B3:B}
Link to working example: https://docs.google.com/spreadsheets/d/1eMkOkyFwvDeYSy-8UlhQum4OWcb-4WJqGxy_CXM8pVs/edit?usp=sharing
I see that in your real sheet you would like to compare some data with now(). You can easily do this using array I propose as a source to Query. There will you have something like this (of course now it will not work - its only an example - an array have only 2 columns, not 15):
=QUERY({ARRAYFORMULA(B3:B10&" "&C3:C10)\A3:A10};"select * where Col15>now()";0)
About Query - you can't perform arthmetic operations on column containing strings. Look at the documentation: https://developers.google.com/chart/interactive/docs/querylanguage#arithmetic-operators
"I can't use Col1 twice (I get error)"
You can duplicate your indata that to solve this.
QUERY({Tab1 Column 1\Tab1 Column 1};"Select Col1, Col2......"
"Tab1 Column 1" is now Col1 and Col2
"The second point is that it is also not possible to merge two columns as a result, so it doesn't work:"
Yes, adding result of column is possible "select Col1+Col7......" is correct.

How to definitely use column names in Google Sheet Query

query function doesn't let you use column names; you have instead to use letters if you refer to a cell range or ColN if you refer to an array.
This is very annoying, most of all when you alter the queried table adding, deleting or exchanging columns.
I would like to use column names, like in a standard SQL query.
You can actually get around this by splitting the Query formula and using other formula's to automatically get the desired column names from a list.
For example if you have a table in range A1:E15 with headers "H1, H2, H3, H4, H5", and you'd like to only get columns H3 & H5:
Store the desired headers (H3 & H5) in another table/range as a list - lets say this range is G1:G2
Use MATCH formula along with TextJoin formula to generate an concatenated string like Col3, Col5
=TextJoin(", ",TRUE,ArrayFormula(IFERROR("Col"&MATCH(G1:G6,$A$1:$E$1,0),"")))
Lets say this was in cell H1
You can refer to this cell in your Query formula like below
=QUERY({A1:E20},"SELECT "&H1&" WHERE Col2='w'")
You can see it in action in below screenshot:
One solution could be recurring to some custom function created by a script, but when you have a not so small table you surely will incur in some error due to the exceeding computation time.
The most efficient solution (using only native functions) I found is as follows.
Suppose you are working on a sheet range, your column names are in row 1 and you want to refer to the column "salary"; you can obtain the column letter by
substitute(address(1,match("salary",A1:1,0),4),"1","")
Instead, if you are querying arrays, it is simpler; the string you need is
"Col"&match("salary",A1:1,0)
The final query could be not so elegant, but the efficiency is guaranteed:
query(
employeessheet!A:E,
"select "&substitute(address(1,match("salary",employeessheet!A1:1,0),4),"1","")&" where ...",
1)

Resources