Link to doc - https://docs.google.com/spreadsheets/d/1-kEMMiB6mj3lt3DUSDvW3BmdGaADajbGUwD_oZVllWU/edit?usp=sharing
I have written a formula that scans a list of Slack threads, formats the threads into plain text, then counts the frequency of every word in the range of plain text.
The goal behind this formula is to search for 'trending' keywords that frequently pop up in a Slack channel.
As you can see there are a couple tabs, the Readme does a decent job explaining everything but I'll give the TLDR here:
The Data tab contains all of the imported Slack information. Data!E2 contains a messy formula of REGEXREPLACE functions that filter the data into plain text.
In the Keywords tab under Keywords!A2 you'll find the 'frequency' counting formula. Here it is:
=ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ", Data!E3:E)," ")&{"";""}),"select Col1, count(Col2) where not(Col1 matches'"&JOIN("|", Stopwords!B1:B)&"') group by Col1 order by count(Col2) desc limit 800 ",0))
To clarify the formula searches the range Data!E3:E and JOINS all the words together with a space delimiter. Then, we SPLIT the string across a number of cells and TRANSPOSE the new row of cells into a column of cells found in Keywords!A3.
Lastly, a query is run to count all of the instances of each word and places each keyword's count into Keywords!B3. One thing to keep in mind is this query is run against a list of stopwords found under Stopwords!B1 in order to remove a good deal of text from the query that would not really need to be counted.
As you can see I am now receive a #REF! error. This wasn't always the case, for a while this formula worked as expected, words were listed in column A under Keywords and column B showed the frequency they appear. The other columns then could run their functions off this list and fill in the rest of the Keywords sheet.
I am curious though, I noted when I set the function range to Data!E3:3000 that every time the function ran it would add another ten or so rows to the range, so it would grow to E3:3018, E3:3032 etc etc. Then it broke and now says "Error Result was not automatically expanded, please insert more rows (13)."
If anyone could help me optimize this formula to run a bit better that would be amazing. Thanks!
Sorry that no one has replied earlier. I just found this.
But I think your error is in column B of your Stopwords tab. The formula there should be moved to B1, instead of partway down the page. That is giving the REF! error, which then appears in your main formula when it references the Range B1:B.
Let me know if this helps.
And you may want to revise your formula to end with:
label count(Col2) ''
which removes the header for the count(Col2) column.
Related
I have no idea how to title this post, apologize in advance.
I have several sheets with a number in Column I and a name centered and merged in columns A:H. I want to obtain the name from A:H of the corresponding value within I but do have duplicates, therefore I need the nth value when permitted. The formula I have so far works up to the point it does not autofill down as an ArrayFormula, so when I drag the formula down I get an #REF! error due to the fact that when a duplicate is found it cannot overwrite the formula below.
This will be easier to showcase: LINK TO SHEET.
Essentially, in the main sheet all the values in I:I of all the other sheets are obtained and sorted, then using that column I want to return the name that corresponds to the value, allowing for duplicates to work themselves out. I believe my issues resides in the $B1 part at the end of the formula preventing it from being an array.
=ARRAYFORMULA(UNIQUE(FILTER({Sheet2!$A$1:$A;Sheet3!$A$1:$A;Sheet4!$A$1:$A},{Sheet2!$I$1:$I;Sheet3!$I$1:$I;Sheet4!$I$1:$I}=$B1)))
Cell F2 on the Sheet1 tab:
=QUERY({Sheet2!A:I;Sheet3!A:I;Sheet4!A:I},"select Col1,Col9 where Col9>0 order by Col9 asc",0)
You can read more about query here.
I am trying to find a formula that will give me the count of unique dates a persons' name appears in one of two different columns and/or both columns.
I have a set of data where a person's name may show up in a "driver" column or a "helper" column, multiple times over the course of one day. Throughout the day some drivers might also be helpers and some days a driver may come in for duty but only as a helper. Basically all drivers can be helpers, but not all helpers can be drivers.
I've attached a link to a sample sheet for more clarity.
https://docs.google.com/spreadsheets/d/1GqNa1hrViX4B6mkL3wWcqEsy87gmdw77DhkhIaswLyI/edit?usp=sharing
I've created a REPORTS tab with a SORT(UNIQUE(FLATTEN)) Formula to give me a list of the names that appear in the DATA Tab.
I'm looking for a way to count the unique dates a name from the name (Column A of the REPORTS Tab) appears in either of the two columns (Column B and/or C of the DATA Tab) to determine the total number of days worked so I can calculate the total number of days off over the range queried.
I've tried several iterations of countif, countunique, and countuniqueifs but cannot seem to find a way to return the correct values.
Any advice on how to make this work would be appreciated.
I think if you put this formula in cell b7 you'll be set. You can drag it down.
=Counta(Unique(filter(DATA!A:A,(DATA!C:C=A7)+(DATA!B:B=A7))))
Here's a working version of your file.
For anyone interested, Google Sheets' Filter function differs slightly from Excel's Filter function because Sheets attempts to make it easier for users to apply multiple conditions by simply separating each parameter with a comma. Example: =filter(A:A,A:A<>"",B:B<>"bad result") will provide different results between the Sheets and Excel.
Excel Filter requires users to specify multiple conditions within parenthesis and denote each criterion be flagged with an OR condition with a + else an AND condition with a multiplication sign *. While this can appear daunting and bizarre to multiply arrays that have text in it, it allows for more flexibility.
To Google's credit, if one follows the required Excel Syntax (as I did in this answer) then the functions will behave the same.
delete what you got and use:
=QUERY(QUERY(UNIQUE({DATA!A:B; DATA!A:A, DATA!C:C}),
"select Col2,count(Col1),"&D2&"-count(Col2)
where Col2 is not null
group by Col2"),
"offset 1", 0)
When I use the query() function, if there is a date or number column that has some blanks at the top it will group all the rows down to the first date/number entry into the first row, with the headers. I need it to treat all rows after the header as individual rows, regardless of blanks. I'm assuming that it treats those rows as not being part of the data because they don't have values in certain columns, however it achieves this decision sporadically. See image or link for context.
My attempts at resolving so far have been
Removing the labels at the top i.e.: =QUERY(A1:C, "SELECT * label A ''") but that kept the grouping.
ORDER BY doesn't help, as those top rows seem exempt, part of the header.
Inserting a column of numbers to trick it into thinking there are numbers, it ignores this wherever I position it (hence the sporadic comment).
I am aware that I could write a Google Apps Script to resolve this, however I'm trying to keep the required skill for adapting the process on a level that others can use. The data is coming from an API that I can't order or format until in the spreadsheet.
This is the only blocker from me fully automating several processes so I'd appreciate any help in finding a workaround or a solution. :)
Image: Cell E1 is =QUERY(A1:C, "SELECT *") and you can see A2 to A5 are shoved unceremoniously in with the header. Solutions?
https://docs.google.com/spreadsheets/d/1MU35HrkRxyHQaliQgKxqeHBViulMnRmPN9UUO7kq0ts/edit?usp=sharing
Using the (optional) headers argument should solve this. See if this helps
=QUERY(A1:C, "SELECT *", 1)
I am looking to find the average of a running list with values broken down by month.
The problem I'm having is excluding the current month from the formula.
I've tried using =indirect and =counta, but only got errors.
The closest I've been able to get is by using the sum and divide method, but it does not produce consistently accurate results.
https://docs.google.com/spreadsheets/d/1YH8vlvGAoZ9Z-uJTdesgmhX8t6pz3JoEhqi4t9-APSE/edit?usp=sharing
Any guidance is appreciated. The sheet is open for comments if it is easier to answer that way.
Its' a little more than what you're specifically asking here, but maybe this gets at what you're really going for. Take a look at a new tab called MK.help where you will find this single formula in cell A1 that populates the whole table.
=ARRAYFORMULA(QUERY({IFERROR(EOMONTH(Expenses!A:A,-1)+1),Expenses!B:C},"select Col1,AVG(Col2) where Col1<"&EOMONTH(TODAY(),-1)&" and Col1 is not null group by Col1 pivot Col3 label Col1'Month-Year'"))
Is there a formula to randomize a column of data which keeps each item represented only once (has the same items)?
So:
APPLES
PEARS
BERRIES
Might come out as
PEARS
BERRIES
APPLES
Randbetween formulas no good here, as you might get two 'PEAR's.
There is a new "randomize range" feature available in the context menu after selecting a range:
]
The following approach implements the idea of pnuts, but without creating a column filled with random numbers:
=query({A2:A20, arrayformula(randbetween(0, 1e20 + row(A2:A20)))}, "select Col1 order by Col2", 0)
Here A2:A20 is the range to be permuted. The arrayformula generates a random integer for each. The query sorts the array by those random integers, but does not put the random numbers in the spreadsheet.
The entropy of randbetween is 64 bits, so collisions are extremely unlikely. And even if two random numbers happen to be equal, that will not generate repetitions; sorting by whatever column never does that. It only means the corresponding pair of entries will appear in their original order.
Came across this while looking for a formula to generate a set of random unique integers and ended up devising my own, so I'm leaving it here for anyone else looking for the same:
=SORT(SEQUENCE(A$1),RANDARRAY(A$1),FALSE) where A$1 is the count of integers to generate (expressed here as a cell reference because I like to create sheets where I can input a number in a cell rather than changing the formula, but this can of course be just a number.)
This can be expanded by adding the three other fields to SEQUENCE as explained in the function's documentation, or by wrapping it in an ARRAYCONSTRAIN to limit the count of entries returned without changing the minimum or maximum values of the generated entries. Hope all this makes sense!
I adopted a similar approach to user6655984 before I found this post.
RANDARRAY seemed to be a neat call once solution.
I had similar demands. Formula based, randomized return order, ability to have only unique records or not as the whim took me.
Right clicking to randomize range meant user interaction I didn't want and the data is dynamic.
I built in the random numbers into a query data range on the fly.
I get the flexibility of query (can easily expand the range, add returned columns filter criteria etc), I don't have to show the random numbers at all and can wrap it in UNIQUE if desired, it re-randomizes with each recalc.
Have some data in column A2:A.
To see the inline data range.
={RANDARRAY(ROWS($A$2:$A)),$A$2:$A}
Query (inc duplicates), filter out empty.
=QUERY({RANDARRAY(ROWS($A$2:$A)),$A$2:$A},"SELECT Col2 WHERE COL2<>'' ORDER BY Col1 ",0)
Same but wrapped by unique.
=UNIQUE(QUERY({RANDARRAY(ROWS($A$2:$A)),$A$2:$A},"SELECT Col2 WHERE COL2<>'' ORDER BY Col1 ",0))
Hope it helps someone, even if years later. :)
Matt