how to split many text in one cell - google-sheets

I'm struggling with problem in google spread sheet
How to split text in one cell if cell looks like below?
Prorudct
Ax2, B, C, D, Ex3
product
Ax2, B, C, D, Ex3
I'd like to make it like this
product
quantity
A
2
B
1

Assuming the string in A1, try
={"Product", "Quantity"; ArrayFormula(split(transpose(split(A1, ", ", 0)&if(regexmatch(split(A1, ", ", 0), "x\d*$"), , "x1")), "x"))}
Change range to suit.
If you'd want to process multiple rows and create a sum of the quantity per product, you can try
=query(ArrayFormula(split(flatten(split(A1:A4, ", ", 0)&if(regexmatch(split(A1:A4, ", ", 0), "x\d*$"), , "x1")), "x")), "Select Col1, sum(Col2) where Col1 <>'' group by Col1 label Col1 'Product', sum(Col2) 'Quantity'", 0)

Assuming you wanted to aggregate many rows of these strings and that the first string is in A2:
=ArrayFormula(QUERY(SPLIT(TRIM(FLATTEN(SPLIT(REGEXREPLACE(FILTER(A2:A,A2:A<>"")&",",",","x1,"),","))),"x"),"Select Col1, SUM(Col2) GROUP BY Col1 LABEL Col1 'product', SUM(Col2) 'quantity'"))
This is a little different from the solution suggested by JPV in that it pre-FILTERs results so that you don't have to name the range specifically to avoid errors. It also takes a different approach to the transformation.
How It Works:
FILTER(A2:A,A2:A<>"")
This assures that the larger formula is only working on non-blank cells in the column to avoid errors.
&","
This is appended to the end of every string, so that every element of every string ends with a comma temporarily. This will be used in the following transformation step.
REGEXREPLACE(_________,",","x1,")
Every comma is then replaced with x1,. Note that even list entries that already have an x2, x3, etc. after them will get this piece added for now. For instance, an original string such as "Ax2, B, C, D, Ex3" will look like this in memory at this point: "Ax2x1, Bx1, Cx1, Dx1, Ex3x1". It won't matter later.
SPLIT(__________,",")
Strings will then be SPLIT at the comma. For instance, the example string will look like this in memory:
Ax2x1 | Bx1 | Cx1 | Dx1 | Ex3x1
FLATTEN will turn these horizontal arrays into one long vertical array.
TRIM will cut out superfluous spaces.
So the above entry will now look like this in memory:
Ax2x1
Bx1
Cx1
Dx1
Ex3x1
SPLIT(_________,"x")
Now, each of those strings will be SPLIT again at the 'x', leaving this:
A | 2 | 1
B | 1
C | 1
D | 1
E | 3 | 1
QUERY(___________,"Select Col1, SUM(Col2) GROUP BY Col1 LABEL Col1 'product', SUM(Col2) 'quantity'"))
Finally, QUERY will act on all of these, Selecting only the first column and the sum of the second column per unique element of the first. (The third virtual column will be ignored.) Chosen labels are then assigned to each of the remaining two columns.

Related

How to get unique values in a column, including cells with multiple values seperated by commas, in Google Sheet?

I have a column in a Google Sheet, which in some cases, includes multiple values separated by commas — like this:
Value
A example
B example
C example
D example
A example, E example
A example, F example
G example, D example, C example
I would like to count all occurrences of the unique values in this column, so the count should look like:
Unique value
Occurrences
A example
3
B example
1
C example
2
D example
2
E example
1
F example
1
G example
1
Currently, however, when I use =UNIQUE(A2:A), the result gives this:
Unique value
Occurrences
A example
1
B example
1
C example
1
D example
1
A example, E example
1
A example, F example
1
G example, D example, C example
1
Is there a way I can count all of the instances of letters, whether they appear in individually in a cell or appear alongside other letters in a cell (comma-seperated)?
(This looks like a useful answer in Python, but I'm trying to do this in Google Sheets)
try:
Formula in C1:
=INDEX(QUERY(IFERROR(FLATTEN(SPLIT(A1:A,", ")),""),"Select Col1, count(Col1) where Col1 is not null group by Col1 label count(Col1) ''"))
Or, as per the comments, split on the combination instead:
=INDEX(QUERY(IFERROR(FLATTEN(SPLIT(A1:A,", ",0)),""),"Select Col1, count(Col1) where Col1 is not null group by Col1 label count(Col1) ''"))
2nd EDIT: To order descending by count use:
=INDEX(QUERY(IFERROR(FLATTEN(SPLIT(A1:A,", ",0)),""),"Select Col1, count(Col1) where Col1 is not null group by Col1 Order By count(Col1) desc label count(Col1) ''"))
Assuming data in A1:A7:
In C1:
=SORT(UNIQUE(FLATTEN(ARRAYFORMULA(SPLIT(A1:A7,", ")))))
In D1:
=ARRAYFORMULA(MMULT(0+ISNUMBER(SEARCH(", "&ColumnCSpilledRange&", ",", "&TRANSPOSE(A1:A7)&", ")),ROW(A1:A7)^0))
Replace ColumnCSpilledRange appropriately.

SUMPRODUCT of last nth values with criteria

I have two columns (see screenshot)
How can i create a formula that sum the second LATEST column values with a criteria from column A?
For example i need the sum of the last 6 values (from column B) of the cells (in column A) that start with HH, so values starting from the bottom.
I know how to make a sum of all values (from column B) containing HH (from column A)
=SUMIF(A1:A;"HH"&"*";B1:B)
P.S. HH and * are separate because i'll substitute the HH with a cell
but now i need to delimit this to the last N values (let say last 3 values)
P.P.S.
=SUMPRODUCT((COUNTIFS(A1:A;"exact text";ROW(A1:A)*{1;1};">="&ROW(A1:A)*{1;1})<=3)*(A1:A="exact text");B1:B)
This works so far ONLY if i write the exact text, not with values like HH*
Maybe try
=sum(index(query({row(A1:A), A1:B}, "Select Col3 where Col2 contains 'HH' order by Col1 desc limit 6")))
and see if that works?
Note:
*the string HH can be also be in a cell (ex. D1)
=sum(index(query({row(A1:A), A1:B}, "Select Col3 where Col2 contains '"&D1&"' order by Col1 desc limit 6")))
*6 indicates the number of values you want to sum
EDIT: For your locale you'll need to use in G1
=sum(index(query({row($B$1:$B) \ $B$1:$C}; "Select Col3 where Col2 contains '"&E2&"' order by Col1 desc limit 3")))
and fill down. See if that works?
This should also work, not sure if it's any easier to understand/ less complicated than any other approach:
=SUM(SORTN(REGEXMATCH(B:B;E2)*C:C;3;0;ROW(B:B)*REGEXMATCH(B:B;E2);0))
Note the number 3 for the number of values you want from the bottom. and the reference to E2, which is "HH" as on your sample sheet.
use:
=QUERY(FILTER({IFNA(REGEXEXTRACT(SORT(B2:B; ROW(B2:B); 0);
"^([A-Za-z]{1,3})\d"))\SORT(C2:C; ROW(B2:B); 0)}; COUNTIFS(
REGEXEXTRACT(SORT(B2:B; ROW(B2:B); 0); "^([A-Za-z]{1,3})\d");
REGEXEXTRACT(SORT(B2:B; ROW(B2:B); 0); "^([A-Za-z]{1,3})\d");
ROW(H2:H43); "<="&ROW(H2:H43))<=3);
"select Col1,sum(Col2) group by Col1 label sum(Col2)''")
full explanation here

How can I avoid having to put 0s into the NULL fields to get a correct query calculation in Google Sheets

I have a Google Sheets question, which I have not been able to figure out yet with Google-Fu and RTFM:
Take the following spreadsheet as an example:
https://docs.google.com/spreadsheets/d/1IvMVaUdUDfYOoKyG0Uwd2n0M1mLjOTE5yZQ9K2R3q2M/edit?usp=sharing
In case the sheet gets lost in time, I am going to post its contents here:
Sheet1:
foo
withdrawal
deposit
C
4
10
D
10
E
10
4
As you see here, the withdrawal field for the D value being foo is empty, i.e. null
Sheet2:
foo
balance
C
=INDEX(QUERY({Sheet1!$A$2:C}, "SELECT SUM(Col3) - SUM(Col2) WHERE Col1 = '"&A2&"'"), 2)
D
=INDEX(QUERY({Sheet1!$A$2:C}, "SELECT SUM(Col3) - SUM(Col2) WHERE Col1 = '"&A3&"'"), 2)
E
=INDEX(QUERY({Sheet1!$A$2:C}, "SELECT SUM(Col3) - SUM(Col2) WHERE Col1 = '"&A4&"'"), 2)
The result is
foo
balance
C
6
D
E
-6
As you see, the balance field for the category D is null, although it should be -10.
The fix for that is to put a 0 into the deposit field in Sheet1 explicitly.
In my example, I get that data using a csv-export, and fields are generally empty and not 0, and it is cumbersome to add the 0 there. Is there a way to have something like COALESCE in that sum there (like in SQL)?
Please let me know.
it seems like something quite a bit simpler would avoid the problem:
=SUMPRODUCT(Sheet1!C:C-Sheet1!B:B,Sheet1!A:A=A2)
for cell B2.
Why don't you just add this in cell A1 of Sheet2 instead of all the Query:
=arrayformula({Sheet1!A1,"balance";if(Sheet1!A2:A<>"",{Sheet1!A2:A,Sheet1!C2:C-Sheet1!B2:B},)})
Obviously ensure cells Sheet2!A2:A and Sheet2!B1:B are empty.
If you have duplicate values of foo, try:
=arrayformula(query({Sheet1!A1,"balance";if(Sheet1!A2:A<>"",{Sheet1!A2:A,Sheet1!C2:C-Sheet1!B2:B},)},"select Col1,sum(Col2) where Col1 is not null group by Col1 label sum(Col2) 'balance'",1))
A better option for a single-cell formula, referencing multiple sheets would be:
=arrayformula(query(
{Sheet1!A:A,n(Sheet1!B:C);Sheet2!A2:A,n(Sheet2!B2:C);Sheet3!A2:A,n(Sheet3!B2:C)},
"select Col1,sum(Col3)-sum(Col2) where Col1 is not null group by Col1 label sum(Col3)-sum(Col2) 'balance' ",1))

SUM last N values with criteria

I have simple table that looks like this:
All i need is to SUM points for specific player (John) in his last 3 matches.
I was able to come with this formula:
SUMPRODUCT(LARGE((A2:B="John")*(C2:D);{1;2;3}))
The problem is that instead of what I was looking for, it sums the highest 3 values, that can be anywhere in that range.
Is there some similar formula, that can do only the last 3 matches?
I think a SUMPRODUCT can get you there with some constructed arrays using a COUNTIFS() and ROW() to get the most recent 3.
This formula:
=SUMPRODUCT((COUNTIFS(A:B,G2,ROW(A:B)*{1,1},">="&ROW(A:B)*{1,1})<=3)*(A:B=G2),C:D)
on this sheet I made seems to work.
I thnk I have a formula that gives what you want. It's not pretty, and I'm sure it can be made simpler, but this works:
=query( query(
{ arrayformula( {ROW(A1:A) } ),
query(A1:D,"select A, B, C, D",1)
} , "select * order by Col1 desc",1),
"select Col2, Col3, Col4, Col5
where (Col2 ='John' or Col3 = 'John')
order by Col1 desc limit 3",1)
Basically, it adds the row number as an extra column to the data, so that we can sort the data in reverse order by row number. Then we query the result to find the first three occurences of 'John', in either Col A or Col B.
Here is a sample sheet:
https://docs.google.com/spreadsheets/d/1-mhTb5Cpp3D-1OltlmCfwlmM-vc2OknHxfJAyHD7BjI/edit?usp=sharing
Credit to Erik Tyler for a previous answer on a different question, on how to add the row number to a query.
Edit: Updated the sheet to provide the SUM of John's (or any player's) scores from the last three matches. This can be combined with the previous formula, if you want a single formula to place somewhere. Or will you have a list of all the players, and you'll want their last three scores beside each of their names?
If I can simplify the formula, I'll update it here.
Let me know if you need something more than this, or if this has answered your question.
Approach
I would use the query formula to get the cells that you need so that you can leverage the limit statement.
You should put a column with the indexes so that you can order the cells in descending order and take the first 3.
Given that your table headers are:
+-----------------------------------------------+
| INDEX | NAME 1 | NAME 2 | POINTS 1 | POINTS 2 |
+-----------------------------------------------+
I would use this query to get your desired result:
=SUMPRODUCT(QUERY(A2:E, "Select D * E where B = 'John' or C = 'John'" order by A desc limit 3"))

Grouping by Name in Query Formula not working

So I am using a query function to count the number of instances a particular name appears in column A of another sheet, and display that result in Column B of this sheet with the respective name in Column A. Here is the function:
=ArrayFormula(QUERY(Attendance!A:A&{"",""},"select Col1, count(Col2) where Col1 != '' group by Col1 label count(Col2) 'Count'",1))
The problem is, while it works for the most part, some of the names appear twice, for instance Fred Jones appears as:
Col A | Col B
Fred Jones | 5
Fred Jones | 2
I have looked at the names, and there is no discernible difference between them, I do not understand why it is not grouping. Is there a way I can use wildcard or something to get Google to combine the names if they are nearly identical? Any help would be appreciated, thanks as always.
try:
=ARRAYFORMULA(QUERY(TRIM({Attendance!A:A}),
"select Col1,count(Col1)
where Col1 is not null
group by Col1
label count(Col1)'Count'", 1))

Resources