How to Find out if a column contains any duplicates - google-sheets

I have a column of numbers. I want to know if there are any duplicates. I don't need to know how many or what their value is. I just want to know if there are any.
The best way I could figure out was to have another column of equal height to the column of numbers, with the formula:
=countif(A:A,A1)>1
So this will put a TRUE next to every number that has one or more duplicates in the list.
From here I need to see if this second column contains a TRUE.
So I have a final cell with this formula in it:
=lookup(true, B:B)
This always displays FALSE, even when there are duplicates in the list, with corresponding "TRUE" values next to them in column B.
Also, is there a simpler way of solving this problem?
Note: I can get it to work if the single cell result simply does an =OR(B:B) but I still want to know why my first way won't work and if there is an all around simpler way of doing this.

you can use both =unique(A:A) and also =counta(unique(A:A))
note: the A:A is just a dummy array i threw in for example, replace with whatever column you want to refer to.
to get a final yes or no, you could nest it together by putting =if(eq(counta(A:A),counta(unique(A:A))),"No Duplicates", "Contains Duplicates")

I'm not sure whether simpler (I am confident the formula could be simplified!) but copy/pasting the following might be deemed so:
=sum(if(ARRAYFORMULA(countif(A:A,A1:A)>1),1,0))
This should return 0 only if there are no duplicates. If a single entry is repeated twice (three instances) and all other values are unique, the result should be 3.
TRUE is curious as the behaviour is not what I expected and I differs from Excel where true would be converted to TRUE, which normally indicates an automatic change from text to function. I don't have an explanation but it may be connected with lookup because the boolean behaves as I would expect in say an if formula.

Related

How to create 2 query options inside an IF function in google sheets

I am trying to create an if function and if true execute one query if false execute another query, but I keep on having a 'Array Literal was missing values for one or more rows' error, the funtion goes like this:
=ARRAYFORMULA({"Cases";IF(H2="SFDC",QUERY(OPTION1),QUERY(OPTION2))})
Any ideia why, is there any other way to construct a conditional like this?
Thanks in Advance.
You need to make sure that our output as the same number of columns. For example, in your equation, you have one column of data (the header "Cases"). Because of this, the result of your IF statement must also have only one column of data.
One way to fix this is to define some extra empty columns. For example:
=ARRAYFORMULA({"Cases","","";IF(H2="SFDC",QUERY({1,2,3},"select *"),QUERY({4,5,6},"select *"))})
I added two more columns after Cases, therefore allowing my resulting query to properly expand.
I managed to solve the issue, I was not specifying the header at the end of the query which was generating a conflict.
Thanks to all.
This error is produced When two arrays have different numbers of columns, in you case make sure the length of array in QUERY(OPTION1) is the same length as
the one in QUERY(OPTION2).
use this formula to check array length.
for QUERY(OPTION1)
=COUNTA(SPLIT(QUERY(OPTION1),"/"))
and this for QUERY(OPTION2)
=COUNTA(SPLIT(QUERY(OPTION2),"/"))
Replace it with your formula of course.

Is there a way to specify an input is a single cell in Google Sheets?

I want to iterate over an array of cells, in this case B5:B32, and keep the values that are equal to some reference text in a new array.
However, SPLIT nowadays accepts arrays as inputs. That means that if I use the array notation of "B5:B32" within ARRAYFORMULA or FILTER, it treats it as a range, rather than the array over which we iterate one cell at a time.
Is there a way to ensure that a particular range is the range over which we iterate, rather than the range given at once as an input?
What I considered was using alternative formulations of a cell, using INDEX(ROW(B5), COLUMN(B5)) but ROW and COLUMN also accept array values, so I'm out of ideas on how to proceed.
Example code:
ARRAYFORMULA(
INDEX(
SPLIT(B5:B32, " ", 1), 1
) = "Some text here"
)
Example sheet:
https://docs.google.com/spreadsheets/d/1H8vQqD5DFxIS-d_nBxpuwoRH34WfKIYGP9xKKLvCFkA/edit?usp=sharing
Note: In the example sheet, I can get to my desired answer if I create separate columns containing the results of the SPLIT formula. This way, I first do the desired SPLITS, and then take the values I need from that output by specifying the correct range.
Is there a way to do this without first creating an output and then taking a cell range as an input to FILTER or other similar functions?
For example in cell C35 I've already gotten the desired SPLIT and FILTER done in one go, but I'd still need to find a way to sum up the values of the first character of the second column. Doing this requires that I take the LEFT value of the second column, but for that I need to output the results and continue in a new cell. Is there a way to avoid this?
Ralph, I'm not sure if your sample sheet really reflects what you are trying to end up with, since, for example, I assume you are likely to want the total of the hours per area.
In any case, this formula extracts all of the areas, and the hours worked, and is then easy to do further calculations with.
=ArrayFormula({REGEXEXTRACT({C5:C9;D5:D9;E5:E9;F5:F9;G5:G9;H5:H9},"(.*) \d"),
VALUE(REGEXEXTRACT({C5:C9;D5:D9;E5:E9;F5:F9;G5:G9;H5:H9}," (\d+)hrs"))})
Try that in cell E13, to see the output.
The first REGEXEXTRACT pulls out all the text in front of the first space and number, and the second pulls out all the digits in a string of " #hr" in each cell. These criteria could be modified, if necessary, depending on your actual requirements. Note that it requires the use of VALUE, to convert the hours from text to numeric values, since REGEXEXTRACT produces text (string) results.
It involved concatenating your multiple data columns into one long column of data, to make it simpler to process all the cells in the same way.
This next formula will give you a sum, for whatever matching room/task you type into B6, as an example.
=ArrayFormula(QUERY({REGEXEXTRACT({C5:C9;D5:D9;E5:E9;F5:F9;G5:G9;H5:H9},"(.*) \d"),
VALUE(REGEXEXTRACT({C5:C9;D5:D9;E5:E9;F5:F9;G5:G9;H5:H9}," (\d+)hrs"))},
"select Col1, sum(Col2) where Col1='"&B6&"' group by Col1 label sum(Col2) '' ",0))
I will also answer my own question given what I know from kirkg13's answer and other sources.
Short answer: no, there isn't. If you want to do really convoluted computations with particular cell values, there are a few options and tips:
Script your own functions. You can expand INDEX to accept array inputs and thereby you can select any set of values from an array without outputting it first. Example that doesn't use REGEXMATCH and QUERY to get the SUM of hours in the question's example data set: https://docs.google.com/spreadsheets/d/1NljC-pK_Y4iYwNCWgum8B4NJioyNJKYZ86BsUX6R27Y/edit?usp=sharing.
Use QUERY. This makes your formula more convoluted quite quickly, but is still a readable and universally applicable method of selecting data, for example particular columns. In the question's initial example, QUERY can retrieve only the second column just like an adapted INDEX function would.
Format your input data more effectively. The more easily you can get numbers from your input, the less you have to obfuscate your code with REGEXMATCHES and QUERY's to do computations. Doing a SUM over a RANGE is a lot more compact of a formula than doing a VALUE of a LEFT of a QUERY of an ARRAYFORMULA of a SPLIT of a FILTER. Of course, this will depend on where you get your inputs from and if you have any say in this.
Also, depending on how many queries you will run on a given data set, it may actually be desirable to split up the formula into separate parts and output partial results to keep the code from becoming an amalgamation of 12 different queries and formulas. If the results don't need to be viewed by people, you can always choose to hide specific columns and rows.

How do you refer to all cells above the cell inside a match function?

Here's what I'm trying to do: I have a list of courses in Google sheets to which items will be added daily by pasting them at the end of the current list. I then want to check if a course has already been added to the list before. If that is the case, the value of a certain column will get the value that the first occurrence of that course has. So I will be using =INDEX(MATCH... and the range in the match part will be all the cells in the column above the cell that is checked. The MATCH part of the formula is obvious if you drag or copy the formula downwards:
= MATCH(A2; A$1:A2; 0)
However I was experimenting with other ways of achieving this, namely
MATCH(A2; "A$1:A"&ROW()-1; 0)
(I also tried this one without the quotation marks) and
MATCH(A2; (ADDRESS(1;1;2)&":"&ADDRESS(ROW()-1;1; 4)); 0)
Both of these don't work. I can't figure out why. (BTW, the semicolon is ok where I live). I get #N/A everywhere and the message that the value has not been found in the match evaluation.
Any thoughts on what I am doing wrong? Also: if I would get this to work, are there any advantages or disadvantages over copying the simple formula all the way downwards? Thank you very much in advance.
Here's an ArrayFormula approach so you only need a single formula
=ArrayFormula(IFNA(IF(MATCH(A2:A,A2:A,0)<>ROW(A2:A)-ROW()+1,A2:A,"N/A")))
It is because second parameter of MATCH function requires range, but you in both cases pass the string. I don't know why you don't like the first option, but you may be able to use the OFFSET function.
=INDEX(A:A;MATCH(A2; OFFSET($A$1;0;0;ROW()-1); 0))

SUMIFS Values from non consecutive Column Cells

I need tu sum several cells that are separated one from another, these cells are
C3,F3,I3,L3,O3,R3,U3,X3,AA3,AD3,AG3,AJ3,AM3,AP3,AS3,AV3,AY3,BB3,BE3,BH3,BK3,BN3,BQ3,BT3,BW3,BZ3,CC3,CF3,CI3,CL3,CO3
if this other cells $C$1,$F$1,$I$1,$L$1,$O$1,$R$1,$U$1,$X$1,$AA$1,$AD$1,$AG$1,$AJ$1,$AM$1,$AP$1,$AS$1,$AV$1,$AY$1,$BB$1,$BE$1,$BH$1,$BK$1,$BN$1,$BQ$1,$BT$1,$BW$1,$BZ$1,$CC$1,$CF$1,$CI$1,$CL$1,$CO$1
that are on the same column but different row are >= to certain number given and <= to other given number, but it returns #Value, can somebody help me find out what am I doing wrong?
This is the function i am writing:
=SUMIFS((C3,F3,I3,L3,O3,R3,U3,X3,AA3,AD3,AG3,AJ3,AM3,AP3,AS3,AV3,AY3,BB3,BE3,BH3,BK3,BN3,BQ3,BT3,BW3,BZ3,CC3,CF3,CI3,CL3,CO3),($C$1,$F$1,$I$1,$L$1,$O$1,$R$1,$U$1,$X$1,$AA$1,$AD$1,$AG$1,$AJ$1,$AM$1,$AP$1,$AS$1,$AV$1,$AY$1,$BB$1,$BE$1,$BH$1,$BK$1,$BN$1,$BQ$1,$BT$1,$BW$1,$BZ$1,$CC$1,$CF$1,$CI$1,$CL$1,$CO$1),">="&B55,($C$1,$F$1,$I$1,$L$1,$O$1,$R$1,$U$1,$X$1,$AA$1,$AD$1,$AG$1,$AJ$1,$AM$1,$AP$1,$AS$1,$AV$1,$AY$1,$BB$1,$BE$1,$BH$1,$BK$1,$BN$1,$BQ$1,$BT$1,$BW$1,$BZ$1,$CC$1,$CF$1,$CI$1,$CL$1,$CO$1),"<="&C55)
I'm not 100% certain, but it looks like the problem here is that SUMIFS requires arguments to be expressed in continuous-range form, e.g. A3:CO3. It looks like you're trying to work with every third column in the dataset, yes? As far as I can tell, this is best (only?) done as an array function, so that you can tell it to filter on "every third column."
Enter this in the cell, then press CTRL+SHIFT+Enter (CSE) to evaluate it as an array function:
=SUM(($A$1:$CO$1>=B55)*($A$1:$CO$1<=C55)*(MOD(COLUMN(A3:CO3),3)=0)*(A3:CO3))
You'll also need to hit CSE every time you evaluate or change it. There's a decent tutorial for array functions at https://support.office.com/en-za/article/Guidelines-and-examples-of-array-formulas-7d94a64e-3ff3-4686-9372-ecfd5caa57c7, which may help if you're unfamiliar with them.

Google Spreadsheet range names

In Google Docs Spreadsheets, one can use Range Names to put labels on ranges of cells to make formulas more legible. In most formulas, one can use the range C:C to denote the entire C column, and C2:C to denote the entire C column after and including C2.
Is there a way to create range names of the same nature? When I try C:C or C2:C or Sheet!C:C or 'Sheet'!C:C I always get the error "The range you specified is not in a valid range format." I would like the range name to expand as my form adds rows to my spreadsheet. Thanks.
I just discovered the if you use the '-' operator, it starts from the bottom row. So,
=INDIRECT("-D:D12")
starts from the last row and works it's way up to D12!
I had a similar problem. Although I do not know how to do exactly what you are asking, you can do essentially the same thing by referencing cells that are not yet created.
For example:
Column C currently has 100 cells (100 rows in the sheet)
Instead of referencing it with C:C, use C1:C999
If you make the row reference high enough, then you can account for future rows that you will create. Hope it helps.
I don't think so... even if you select a column manually while in the Range Name selector, it complains. That would be a nice feature and it would make sense since they support column ranges for formulas already.
I believe this does work now. I have a range name of "Sheet1!A10:AW10" with no problems.
If you try to do a whole column, I think it will just take all the available cells in the column at that time. i.e. if you make more cells later, you need to manually add to the range name.
I had the same problem with ranges such as A3:A which normally work in other places such as ARRAYFORMULA(), but the workaround is to not specify the starting row, such as A:A. In cases when this would be a problem, you can proxy the data through another column using something like ARRAYFORUMULA(A25:A) as the formula.
Update: Apparently I haven't read the question properly. I see that the OP had tried leaving out the row number, so perhaps it wasn't working at that time, but it does now. The notations still don't work.
Update2: I didn't notice that google spreadsheet replaces ranges like A:A to A1:A50, so new rows added later on do not still get included. That I think is what #Dean is trying to say in his answer.
I think it's a helpful tool to use Insert -> Define new range to make a wizard appear and make the syntax correct. Hehe
My response in other topic

Resources