Thanks for the help. I have a large-ish data set where I am trying to query a long column of 2-3 word phrases.
I am using the following code to try and pick out the frequency of the repeated words. Example data below.
My issue is that the code is not resolving - I think it is because there are some special characters in the data.
Some Japanese, some copy-right signs, URLs, and greek symbols.
1) Is there a way to easily remove rows with special characters?
2) Am I doing something else incorrectly?
3) How would I do the same frequency formula I have here - but with two word phrases and three word phases?
=ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ";B3:B);" ")& .
{"";""});"select Col1, count(Col2) group by Col1 order by count(Col2)
desc limit 10 label Col1 'Word', count(Col2) 'Frequency'";0))
I received the code from here, btw.
Google Docs spreadsheet formula for most frequent keywords
Besides the extra "." the formula seems to refer to the wrong column. Try this:
=ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",A:A)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc label Col1 'Word', count(Col2) 'Frequency'",0))
This also counts all, not just the top 10.
I don't think this approach will work for two word phrases.
Related
So i have this Formula, and it works great, it finds, and counts instances where a name appears, with a given text from AD:AD in the previous column that has Square brackets.
=IFERROR(INDEX(QUERY({regexreplace('All Report Sheet'!$B$1:$B, "\[|\]",),'All Report Sheet'!$C$1:$C},"select count(Col1) where (Col1 matches '"&JOIN("|",REGEXREPLACE($AD$2:$AD, "\[|\]",))&"') and Col2='"&$A14&"' label count(Col1) ''")))
I have recently seen that it is not counting names with regular brackets so i tried to tweek it to allow that to the Below formula, but i'm having no luck at all, i'm not sure of the issue.
=IFERROR(INDEX(QUERY({regexreplace('All Report Sheet'!$B$1:$B, "\[\]\(\)",),'All Report Sheet'!$C$1:$C},"select count(Col1) where (Col1 matches '"&JOIN("|",REGEXREPLACE($AD$2:$AD, "\[\]\(\)",))&"') and Col2='"&$A65&"' label count(Col1) ''")))
Any Help is appreciated
I'm looking for an efficient way to gather and aggregate some date in Google Sheets. I've been looking at the query function, pivot tables, and Index + Match formulas, but so far I've not found a way that brings me to the result I'm looking for. I have a set of data which looks more or less as follows.
The fields with an X represent irrelevant data which I don't want to show up in my end result. They only serve to illustrate that there are columns of data that I don't want in between the columns of data that I do want. The data in those columns is of varying types and of varying values per type, they are not actually fields with an "X" in it. Only the fields with numbers are of interest along with the related names at the top and left of those. The intent is to create a list that looks more or less like this.
I've highlighted those yellow fields because that data has been aggregated. For example, in the original file field D3 shows a relation between Laura and Pete with the number 1, and field L3 also shows a relation between Laura and Pete, so the number in that field is to be added to the number in the other field resulting in an aggregated total of 2 for that particular combination.
I would really appreciate any suggestions that can help me get to an elegant and efficient solution for this. The only solutions I can come up with would involve multiple "in-between" sheets and there just has to be a better way.
UPDATE:
Solved by applying the solution in player0's answer. I just had to switch around the order of Col1 and Col2 in the formula to get the table sorted the way I needed it. Formula looks like below now. Many thanks to both player0 and Erik Tyler for their efforts.
=INDEX(QUERY(SPLIT(FLATTEN(A2:A&"×"&D1:N1&"×"&D2:N), "×"),
"select Col2,Col1,sum(Col3)
where Col2 is not null
and Col3 is not null
group by Col2,Col1
label sum(Col3)''", ))
try:
=INDEX(QUERY(SPLIT(FLATTEN(A2:A&"×"&D1:N1&"×"&D2:N), "×"),
"where Col3 is not null and Col2 is not null", ))
update:
=INDEX(QUERY(SPLIT(FLATTEN(A2:A&"×"&D1:N1&"×"&D2:N), "×"),
"select Col1,Col2,sum(Col3)
where Col3 is not null
and Col2 is not null
group by Col1,Col2
label sum(Col3)''", ))
Given your current data set (which only appears to extend to Col N), place the following somewhere to the right of Col N:
=ArrayFormula(SPLIT(TRANSPOSE(QUERY(TRANSPOSE(QUERY(SPLIT(QUERY(FLATTEN(FILTER(IF(NOT(ISNUMBER(D2:N)),,D1:N1&"~ "&A2:A&"|"&D2:N),A2:A<>"")),"Select * WHERE Col1 Is Not Null"),"|"),"Select Col1, SUM(Col2) GROUP BY Col1 LABEL SUM(Col2) ''")&"~ "),,2)),"~ ",0,1))
It would be better if this were placed in a different sheet from the original data. Supposing that your original data sheet is named Sheet1, place the following version of the above formula into a new sheet:
=ArrayFormula(SPLIT(TRANSPOSE(QUERY(TRANSPOSE(QUERY(SPLIT(QUERY(FLATTEN(FILTER(IF(NOT(ISNUMBER(INDIRECT("Sheet1!D2:"&ROWS(Sheet1!A:A)))),,Sheet1!D1:1&"~ "&Sheet1!A2:A&"|"&INDIRECT("Sheet1!D2:"&ROWS(Sheet1!A2:A))),Sheet1!A2:A<>"")),"Select * WHERE Col1 Is Not Null"),"|"),"Select Col1, SUM(Col2) GROUP BY Col1 LABEL SUM(Col2) ''")&"~ "),,2)),"~ ",0,1))
This separate-sheet approach and formula allows for the original data to extend indefinitely past Col N.
TL:DR
Trying to figure out how to exclude words from a "find key words" formula in Google Sheets to ensure relevant keywords for SEO.
First question here - exciting right 🤩! So, I have a spreadsheet with social media posts and I want to get a sense which are the most commonly used words. I have used the following formula with excellent results:
=ARRAYFORMULA((Query(TRANSPOSE(SPLIT(JOIN(" ",B7:B26), " ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count (Col2) 'Frequency'",0)))
However, and here is the question. I want to exclude commonly used words such as "a", "the", "to", well you get the point. I have yet to figure out how to do this. Ideally, I have a separate sheet where I have an "exclusion list" of such words to be removed.
Thanks so much for taking time to help!
Best!
try adding where with regex where "A2:A" is your "list":
=ARRAYFORMULA((QUERY(TRANSPOSE(SPLIT(JOIN(" ", B7:B26), " ")&{""; ""}),
"select Col1,count(Col2)
where not lower(Col1) matches '^"&TEXTJOIN("$|^", 1, LOWER(A2:A))&"$'
group by Col1
order by count(Col2) desc
limit 20
label Col1'Word',count(Col2)'Frequency'", 0)))
Hi everyone,
I have a set of raw data, I use query function in cell C2 to order and count the raw data. May I know how to include the ROUND function in the QUERY so that the output in column C will be only 1 decimal place. The reason I'm doing this is to reduce the number of bars in the bar chart. As you can see in the chart, 50.79 and 50.8 are considered as 2 bars, but it will be more presentable if I combine them together by rounding up 50.79
*Preferably doing the rounding in QUERY instead of creating another column
This is my spreadsheet:
https://docs.google.com/spreadsheets/d/12enDKh4hDE67XyvA-21_0CeVxNojzMmNWGjRZfCFmQE/edit#gid=0
Any help will be greatly appreciated!
I have added two sheets ("Erik Help" and "Erik Help 2").
Yours is a case where pre-processing the QUERY data will be beneficial. Note that doing so creates a virtual array that is no longer able to be referenced by column letter; rather, Colx notation is required in the Select clause.
The formula in "Erik Help" produces exactly the results you requested in your post:
=query(FILTER(ROUND(A2:A,1),A2:A<>""),"select Col1, count(Col1) where Col1 is not null group by Col1 order by Col1 asc label Col1 'Values', count(Col1) 'Count'")
The formula in "Erik Help 2" refines the data by rounding every number in A3:A to the nearest 0.5 (which you could change to 0.2 or 0.25 or whatever you like). You can use this option depending on how discrete you need your results to be:
=query(FILTER(MROUND(A2:A,0.5),A2:A<>""),"select Col1, count(Col1) where Col1 is not null group by Col1 order by Col1 asc label Col1 'Values', count(Col1) 'Count'")
Please see: Extracting and counting unique word frequency from a range
In that question the asker was seeking unique single words.
I'm trying to accomplish the same but finding every unique pair of words.
If a cell doesn't have two words, then it doesn't have any entries.
If a cell has 3 words then it would have two combinations A + B and B + C
I've tried to parse with splits and substitute pipes for spaces by using the len(cell) - len(substitute(cell," ","")) which gives me the number of words, but that doesn't work either.
Try this (assuming your words are in Column A, put this in cell B1):
=index(
query(
query(
trim(iferror(flatten(split(
regexreplace(regexreplace(lower(A:A),"[^A-Za-z\ \']+",""),"([\w\']+\ [\w\']+)","$1,")
&","&
regexreplace(regexreplace(lower(if(len(A:A)=len(substitute(A:A," ",""))+1,,A:A)),"[^A-Za-z\ \']+",""),"\w*\ ([\w\']+\ [\w\']+)","$1,")
,",",1,1)),)),
"where Col1 like '% %' order by Col1",1),
"select Col1, count(Col1) group by Col1 label Col1 'Word pairs', count(Col1) 'Qty'",0)
)
It's quite involved, but I'll break it down if it works for you!