Excel - sum given a condition in a relative column

Excel - sum given a condition in a relative column - google-sheets

Imagine I have this (time)sheet:
Hours | Text
------+----------------------
3 | fixing PRA-345
4.5 | refactoring PRA-222
5 | PRA-345 and stuff
And I want to calculate how much cumulative time one has spent on a ticket with a given number.
In other words sum the hours based on the text in a neighbouring cell.
Can you do it without extra column? what I did was to make an extra column that returned either the number, if given text was present (via REGEXMATCH) or 0. And then I ran a SUM on that column. Having this solved without extra column would be nice ;)
Expected output
In my case if would be enough for a given string to find the total sum of hours. So if I cell(say it's D1) has the hardwired text, such as "PRA-345" I want the cell to the left(E1) to display the total hours(8 in this case)

Is this what you need?
=sum(filter(B5:B,regexmatch(C5:C,E5)))
Reference:
FILTER
SUM

Instead you can try
=QUERY({A1:A11,ArrayFormula(REGEXEXTRACT(B1:B11,"PRA-\d+"))},
"select Col2, sum(Col1) where Col1 is not null
group by Col2 label Col2 'Tickets', sum(Col1) 'Sums' ",1)
Functions used:
QUERY
ArrayFormula
REGEXEXTRACT

Related

Google sheets Query function with Arrayformula

For each of the email id, I want to get latest 10 records by timestamp. How do I get the results with arrayformula? Query function is not important as long as I can still achieve this with arrayformula. Here is the sample data:
https://docs.google.com/spreadsheets/d/1YAHA02VM-5MXzVKhkxu_eODPKObpoz441mGX8lOFu5M/edit?usp=sharing

Try this on another sheet, row 1:
=arrayformula(query({query({Sheet1!$A:$C},"order by Col1 desc,Col2",1),{"Dupe position";countifs(query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0),query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0),row(Sheet1!$A2:$C),"<="&row(Sheet1!$A2:$C))}},"select Col1,Col2,Col3 where Col1 is not null and Col4 <= 10 order by Col1",1))
You can adjust the number of records found by adjusting Col4 <= 10, and also the final sort by altering order by Col1 at the end of the formula.
Explanation
This gets the data from Sheet1, sorts it by date desc then email asc:
query({Sheet1!$A:$C},"order by Col1 desc,Col2",1)
Then to the side of this data, a COUNTIFS() is used to get the number each time an email appears in the list above (since it's sorted desc, 1 represents the most recent instance).
countifs(<EmailColumnData>,<EmailColumnData>,row(<EmailColumn>),"<="&row(<EmailColumn>))
In place of <EmailColumnData> in the COUNTIF() is:
query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0)
In place of <EmailColumn> above, we only want the row number so we don't need the actual data. We can use:
Sheet1!$A2:$C
Various {} work as arrays to bring the data together.
Eg., {a,b,c;d,e,f} would result in three columns, with a, b, c in row 1 and d, e, f in row 2. , is a new column, ; is a return for a new row.
A final query around everything gets the 3 columns we need, where the count number in col 4 is <=10, then sorts the output by Col1 (date asc).

On second thoughts, maybe this is bit cheeky, but this might do it ( taken from conditional rank idea )
=ArrayFormula(filter(A2:C,countifs(A2:A,">="&A2:A,B2:B,B2:B)<=10,A2:A<>""))
EDIT
The above assumes (because the data is time-stamped) dups shouldn't occur. If they do and the data is pre-sorted, you can use row number as a proxy for time stamp as suggested by #Aresvik.
Alternatively, you could count separately
(a) only rows with a later timestamp
plus
(b) rows with the same time stamp but with earlier (or identical) row number
=ArrayFormula(filter(A2:C,countifs(A2:A,">"&A2:A,B2:B,B2:B)+countifs(A2:A,"="&A2:A,B2:B,B2:B,row(A2:A),"<="&row(A2:A))<=10,A2:A<>""))

I have added a new sheet ("Erik Help") with the following formula in A1:
=ArrayFormula({"Submitted Time","Email","Score";SORT(SPLIT(FLATTEN(QUERY(SORT(TRANSPOSE(SPLIT(TRANSPOSE(QUERY(IF(Sheet1!B2:B=TRANSPOSE(UNIQUE(FILTER(Sheet1!B2:B,Sheet1!B2:B<>""))),Sheet1!A2:A&"|"&Sheet1!B2:B&"|"&Sheet1!C2:C,),,COUNTA(Sheet1!A2:A)))," ",0,1)),SEQUENCE(MAX(COUNTIF(Sheet1!B2:B,Sheet1!B2:B))),0),"LIMIT 10")),"|",1,0),1,0)})
The number of records is set after LIMIT.
The order is set by the final two numbers: 1,0 (meaning "sort by column 1 in reverse order," which, as currently set, is sorting in reverse order by date/time).

Using =averageifs with multiple text criteria in different columns

I'm busy with creating an spreadsheet where I can get the average price of holidaydeals with a specific tag (f.e. a destination). When I do this for only one tag column the formula is working :) :
=averageifs(C4:C10,E4:E10,L1,F4:F10,"Frankrijk vakantie")
But... I have more then 10 tag columns in total and in every column something like "Frankrijk vakantie" could be found. My simple mind thought, okay lets change F4:F10 (in this example) to F4:G10 to look for "Frankrijk vakantie" in two columns. But... the formula didn't work.
Link to spreadsheet: https://drive.google.com/file/d/1Gw5VC5qT1bzbFIPBu5j4dLXBmJjh7GuW/view?usp=sharing
I've also added a screenshot. I hope that someone can help me with this. Would be great, thank you!

In L2 try
=query({C4:C11, ArrayFormula(N(mmult(N(F4:O11="Frankrijk vakantie"), transpose(column(F3:O3)^0))>0))}, "Select AVG(Col1) where Col2 > 0 label AVG(Col1) '' ")
and see if that works?
EDIT: to include a month/year filter try
=query({C4:C11, ArrayFormula(N(mmult(N( (F4:O11="Frankrijk vakantie")*(E4:E11=L1)), transpose(column(F3:O3)^0))>0))}, "Select AVG(Col1) where Col2 > 0 label AVG(Col1) '' ")
The formula makes use of a virtual array, containing the values of column C and the output of the mmult() function. The latter creates a column with 1's if 'Franrkijk vakantie' is found in that row AND the date in column E matches the date in L1. The query then averages the values from column C and filters out the rows where the conditions of the MMULT() are not met.
EDIT 2: To check for a 'double match' in the row, try
=query({C4:E11, transpose(query(transpose(F4:S11),,9^99))}, "Select AVG(Col1) where Col3='"&L1&"' and Col4 contains 'Frankrijk vakantie' and Col4 contains 'Europa vakantie' label AVG(Col1)''", 0)
Change range to suit.

Find frequency of words in a column in Google Sheets and lookup another value from a different column using formulae

I have 2 columns of data in a Google Sheet. Column1 is unique words or sentences (words are repeated in sentences) and the Column2 is a numeric value next to each (say votes). I am trying to get a list of unique words from Column1 and then the sum of values (votes) from Column2 when the word was present either on its own or in a sentence.
Here is a sample of the data I am working with in Google Sheets:
Term Votes
apple 20
apple eat 100
orange 30
orange rules 40
rule why 50
This is what the end result looks like:
Word Votes
apple 120
eat 100
orange 70
rules 40
rule 50
why 50
The way I am doing it now is quite long and I am not sure if this is the best solution.
Here's my solution:
JOIN values in Column1 using a delimiter " " and then SPLIT them using the same delimiter and then TRANSPOSE them into a column all in one step. This way I have a list of all the words used in Column1 in say Column3.
In Column4 pull out all the UNIQUE values and then do a COUNTIF for the unique values from Column3. This way I am able to get the frequency of each unique word by referencing to the lsit of all words.
In order to get the sum of Votes I have to TRANSPOSE Column4 and then QUERY Column1 and Column2 by using dynamic text in the formula. The formula looks like =QUERY(Column1:Column2, "SELECT SUM(Column2) WHERE Column1 CONTAINS '" & referenceToUniqueWord & "'", 1). The reason I have to transpose first is because the query formula outputs 2 cells of data ie Text: sumColumn1 and Number: 'sum of votes'. Since for one cell of unique word I get two cells of data I am not able to drag the formula down and hence I have to do it horizontally.
I finally get three rows of data after the last step:
One row is just transposed Column4 (all the unique words). Second row is just the text sumColumn2 from using the QUERY formula. And third row is the actual sum of votes, resulting from individual QUERY formulae. I then transpose these rows to columns and to get my final table I VLOOKUP the frequency values arrived at earlier.
This approach is lengthy and prone to errors. Also doesn't work if the list is large and in the initial JOIN I get an error of limit 50,000 reached. Any ideas to make it better are welcome. I know this can be done much easier using Scripts but I'd prefer to have it done using only formulae.

try:
=ARRAYFORMULA(QUERY(SPLIT(TRANSPOSE(SPLIT(QUERY(TRANSPOSE(QUERY(
IF(IFERROR(SPLIT(A:A, " "))="",,"♠"&SPLIT(A:A, " ")&"♦"&B:B)
,,999^99)),,999^99), "♠")), "♦"),
"select Col1,sum(Col2)
group by Col1
order by sum(Col2) desc
label sum(Col2)''"))

SUM last N values with criteria

I have simple table that looks like this:
All i need is to SUM points for specific player (John) in his last 3 matches.
I was able to come with this formula:
SUMPRODUCT(LARGE((A2:B="John")*(C2:D);{1;2;3}))
The problem is that instead of what I was looking for, it sums the highest 3 values, that can be anywhere in that range.
Is there some similar formula, that can do only the last 3 matches?

I think a SUMPRODUCT can get you there with some constructed arrays using a COUNTIFS() and ROW() to get the most recent 3.
This formula:
=SUMPRODUCT((COUNTIFS(A:B,G2,ROW(A:B)*{1,1},">="&ROW(A:B)*{1,1})<=3)*(A:B=G2),C:D)
on this sheet I made seems to work.

I thnk I have a formula that gives what you want. It's not pretty, and I'm sure it can be made simpler, but this works:
=query( query(
{ arrayformula( {ROW(A1:A) } ),
query(A1:D,"select A, B, C, D",1)
} , "select * order by Col1 desc",1),
"select Col2, Col3, Col4, Col5
where (Col2 ='John' or Col3 = 'John')
order by Col1 desc limit 3",1)
Basically, it adds the row number as an extra column to the data, so that we can sort the data in reverse order by row number. Then we query the result to find the first three occurences of 'John', in either Col A or Col B.
Here is a sample sheet:
https://docs.google.com/spreadsheets/d/1-mhTb5Cpp3D-1OltlmCfwlmM-vc2OknHxfJAyHD7BjI/edit?usp=sharing
Credit to Erik Tyler for a previous answer on a different question, on how to add the row number to a query.
Edit: Updated the sheet to provide the SUM of John's (or any player's) scores from the last three matches. This can be combined with the previous formula, if you want a single formula to place somewhere. Or will you have a list of all the players, and you'll want their last three scores beside each of their names?
If I can simplify the formula, I'll update it here.
Let me know if you need something more than this, or if this has answered your question.

Approach
I would use the query formula to get the cells that you need so that you can leverage the limit statement.
You should put a column with the indexes so that you can order the cells in descending order and take the first 3.
Given that your table headers are:
+-----------------------------------------------+
| INDEX | NAME 1 | NAME 2 | POINTS 1 | POINTS 2 |
+-----------------------------------------------+
I would use this query to get your desired result:
=SUMPRODUCT(QUERY(A2:E, "Select D * E where B = 'John' or C = 'John'" order by A desc limit 3"))

Google Sheet formula to find the minimal sum of pairs in array

I'm looking for solution for my problem. I have a sheet to summarize lap times for some competition. We make 3 laps in each qualification. We are qualifying to finals by 2 best laps one after another. So we sum first and second lap or second and third lap and then choose the smallest one sum. I've managed to get array of pairs and filter out empty cells (run not finished). Number of pairs may vary form 1 to 20.
Now is my question. How to find the smallest sum of pairs from my array in one elegant formula?
Here is my sample sheet: example sheet

=QUERY(QUERY({A17:B17;B17:C17;D17:E17;E17:F17;G17:H17;H17:I17};
"select Col1+Col2
where Col1 is not NULL
and Col2 is not NULL");
"select min(Col1)
label min(Col1)''")

I know this isn't exactly your question and fair play if it gets marked down, but in your quest for an 'elegant formula', I was wondering if there was a more general way to get the pairs in the first place.
You can do it with by using two ranges offset by one cell together with the mod of the column number:
=ArrayFormula(query(
query({transpose({A17:H17;B17:I17;mod(column(A17:H17),3)})},"select Col1+Col2 where Col1 is not null and Col2 is not null and Col3>0")
,"select min(Col1) label min(Col1) ''"))

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart