How to Compare lists of values (Sheets) - google-sheets

I am looking to compare cells containing a list of comma separated strings to a column containing solutions made up of comma-separated lists.
My test data
Each cell in the "Test lists" column will need to be checked against every one of the cells in the "Possible Solutions" column. If any exist that contain all of the values of the list we are currently testing, it should return TRUE. Otherwise, FALSE. It is OK if the possible solution contains MORE values, just not less.
The image above contains the best solution to this I have managed to work out so far ... the "Should be" column shows the answer I am expecting.
Any suggestions on other things I might try?

This should work. Paste in cell D2 and drag!
=ARRAYFORMULA(SUM(N(ARRAYFORMULA(
LEN(FILTER($A$2:$A, $A$2:$A<>""))
- LEN(REGEXREPLACE(FILTER($A$2:$A, $A$2:$A<>""), SUBSTITUTE($B2, ", ", "|"), ""))
= LEN(SUBSTITUTE($B2, ", ", ""))
)))) > 0
The general idea here is to extract all test list elements from a given possible solution list, and then check whether the number of characters removed equals the length of the test list. If that's true, then the test list is contained within the solution list.
The inner ARRAYFORMULA() extends that search scheme to every possible solution list (i.e. to all of column A); the outer ARRAYFORMULA(SUM(N(...))) > 0 returns TRUE if any of the possible solutions pass the test.

Related

Compiling a list using INDEX but need to skip certain rows

I'm compiling a list based on the first answers recieved between row N and AF.
I'm using these two formulas:
=INDEX(N2:O2,MATCH(FALSE,ISBLANK(N2:O2),0))
and
=INDEX(R2:AF2,MATCH(FALSE,ISBLANK(R2:AF2),0))
Is there a way to combine them whilst not searching in rows P & Q?
These are generated from a Form response so can't just be switched around.
try:
=INDEX({N2:O2, R2:AF2}, MATCH(FALSE, ISBLANK({N2:O2, R2:AF2}), 0))
If Sheet1 is an intake sheet of form results, you should not add any data, formulas or even formatting to that sheet. It virtually always causes issues. A form intake sheet should be left exactly as it is. A new sheet can then be used to bring over the results of the form intake sheet as you want to see them.
However, since you didn't specify any of that, I will supply a formula written to work in the same sheet as your posted example and in-sheet examples.
Clear an entire column and place the following in the top cell of that column:
=ArrayFormula({"Attendee Name"; IF(E2:E="",,IFERROR(REGEXEXTRACT(TRIM(TRANSPOSE(QUERY(TRANSPOSE(FILTER(IF(N2:AK="",,N2:AK&"~"),N1:AK1=N1)),,COLUMNS(N1:AK1)))),"\s*([^~]+)"),"(none listed)"))})
This one formula will produce a header (the text of which you can change within the formula itself as you lie) and all valid results for all rows.
The inner IF will append a tilde (~) to any non-null entries in the range N2:AK.
FILTER will keep only those columns in this range where the header is the same as the header in N1 (i.e., "Attendee Name").
TRANSPOSE(QUERY(TRANSPOSE( ),,COLUMNS( ))) is colloquially called a "Query smash." It will form one cell from all horizontal results per row.
TRIM will cut any preliminary spaces and form a true string.
REGEXEXTRACT will pull the from the first non-space character up to but not including the first tilde (from those appended in the first step)—in other words, the first full valid entry from any column.
IFERROR will return a message if there is an error, with the likely error being that there were no valid entries for "Attendee name" in any column.
The outer IF will leave the cell blank if the no training event exists in E2:E.
{ } forms a virtual array that places the header over all other results.
ArrayFormula( ) signifies that multiple results will be processed at once.
Because this is an array formula that is being "asked" to process every row, you cannot manually type into any cell of this results column. If you do, you will "break the array"; everything except what you just typed will disappear, leaving only an error in the formula cell. If you need to add or change a name, you need to do that in the raw results range (e.g., manually type a name or a new name in Col N), which will then turn up in the formula output range.

JOIN header row values across a row based on non-blank values in cells

So I have two rows:
ID
TagDog
TagCat
TagChair
TagArm
Grouped Tags (need help with this)
1
TRUE
TRUE
TagDog,TagArm
Row 1 consists mainly of Tags, while rows 2+ are entries. This data ties ENTRIES to TAGS.
What I'm needing to do is concatenate/join the tag names per entry. For example, look at the last column above.
I suspect we could write a formula that would:
Create an array of non-empty cells in the row. (IE: [2,4])
Return it with the header row A (IE: [A2,A4])
Then join them together by a comma
But I am unsure how to write the formula, or if this is even the best approach.
Here's the formula:
={
"Grouped Tags (need help with this)";
ARRAYFORMULA(
REGEXREPLACE(TRIM(
TRANSPOSE(QUERY(TRANSPOSE(
IF(NOT(B2:E11),, B1:E1)
),, COLUMNS(B1:E1)))
), "\s+", ",")
)
}
The trick used is called vertical query smash. That's the part:
TRANSPOSE(QUERY(TRANSPOSE(...),, Nnumber_of_columns))
You can find a brief description of this one and his friends here.
I wasn't able to create a single formula that would do this for me, so instead, I utilized a formula inside of Sheets' Find/Replace tool, and it worked like a charm!
I did a find/replace, replacing all instances of TRUE with the following formula:
=INDIRECT(SUBSTITUTE(LEFT(ADDRESS(ROW(),COLUMN()),3),"$","")&"$1")
What this formula does is it finds the cell's letter, then gets the first row of the cell using INDIRECT.
Breaking down the formula:
ADDRESS(ROW(),COLUMN()) returns the direct reference: $H$1
LEFT("$H$1",3) returns $H$
SUBSTITUBE("$H$","$","") replaces the dollar signs ($) and returns H
INDIRECT(H&"$1") references the exact cell H$1
Now, I can replace all instances of TRUE with that formula and the magic happens!
Here is a video explanation: https://youtu.be/SXXlv4JHDA8
Hopefully, that helps someone -- however, I would still be interested in seeing what the formula is for this solution.

Check if data that satisfies multiple criteria exists in both sheet 1 and sheet 2

My table contains 2 sheets with a different number of columns. I want to add a column that will display true or false (or any other 2 opposite values ) for each row depending on whether this row satisfies 2 criteria which are: sheet1!col1=sheet2!col1 and sheet1!col2=sheet2!col2.
You'll find an illustration below.
I've tried using
ARRAYFORMULA(VLOOKUP(A1&B1, {Sheet1!A1:A4&Sheet1!B1:B4,Sheet1!C1}, 3))
but I get an error message
vlookup evaluates to an out of bound range
So I wanted to try
QUERY({Sheet1!A1:B4,A1:B5}, "Select C where ")
but I couldn't figure out how to write the condition where (sheet1)col1=(sheet2)col1 & (sheet1)col2=(sheet2)col2 and I also don't know if I can work with tables of different dimensions. I finally tried
=MATCH(A1&B1,{Sheet1!A1:A&Sheet1!B1:B})
but it always returns 1.
Any idea please?
Sheet 1
Sheet 2
Your first formula is almost right. You are getting the error message because there is only one column in the curly brackets so you have to change it to
=ArrayFormula(vlookup(A1&B1,{Sheet2!A:A&Sheet2!B:B},1,false))
and add the 'false' to make sure it only does exact matches.
To make the query work you need the right syntax to access cells in the current sheet:
=query(Sheet2!A:B," select A,B where A='"&A1&"' and B='"&B1&"'")
To make the match work, you need to enter it as an array formula and add a zero to specify exact match:
=ArrayFormula(MATCH(A1&B1,{Sheet2!A:A&Sheet2!B:B},0))
However I would take flak from my colleagues if I didn't point out that there is an issue with the vlookup and match as shown above - toto&moto would match with not just toto&moto, but also with tot&omoto etc. The way round this is to add a separator character e.g.
=ArrayFormula(vlookup(A1&"|"&B1,{Sheet2!A:A&"|"&Sheet2!B:B},1,false))
=ArrayFormula(MATCH(A1&"|"&B1,{Sheet2!A:A&"|"&Sheet2!B:B},0))
These still need some tidying up if they are to report Yes and No, and also not to give false positive on blank rows - also the vlookup and match can be written as self-expanding array formulas - but that is the short answer to the question.

Get data between number two and three delimiter

I have a large list of people where each person has a line like this.
Bill Gates, IT Manager, Microsoft, <https://www.linkedin.com/in/williamhgates>
I want to extract the company name in a specific cell. In this example, it would be Microsoft, which is between the second and third delimiters (in this case, the delimiter is ", "). How can I do this?
Right now I'm using the split method (=SPLIT(A2, ", ",false)). But it gives me four different cells with information. I would like a command only to output the company in one cell. Can anyone help? I have tried different things, but I can't seem to find anything that works.
Maybe some regex can do it, but I'm not into regex.
Short answer
Use INDEX and SPLIT to get the value between two separators. Example
=INDEX(SPLIT(A1,", ",FALSE),2)
Explation
SPLIT returns an 1 x n array.
The first argument of INDEX could be a range or an array.
The second and third arguments of INDEX are optional. If the first parameter is an array that has only one row or one column, it will assume that the second argument corresponds to the larger side of the array, so there is no need to use the third argument.
A bit nasty, but this formula works, assuming data in cell D3.
=MID(D3,FIND(",",D3,FIND(",",D3)+1)+2,FIND(",",D3,FIND(",",D3,FIND(",",D3)+1)+1)-FIND(",",D3,FIND(",",D3)+1)-2)
Broken down, this is what it does:
Take the Mid point of D3 =MID(D3
starting two characters after the 2nd comma FIND(",",D3,FIND(",",D3)+1)+2
and the number of characters between the 2nd and 3rd comma, excluding spaces FIND(",",D3,FIND(",",D3,FIND(",",D3)+1)+1)-FIND(",",D3,FIND(",",D3)+1)-2)
I'll add my favourite ArratFormula, which you could use to expand list automatically without draggind formula down. Assumptions:
you have list with data in range "A1:A20"
all data have same sintax "...,Company Name, <..."
In this case you could use Arrayformula, pasted in cell B1:
=ArrayFormula(REGEXEXTRACT(A1:A20,", ([^,]+), <"))
If your data doest's always look like "...,Company Name, <..." or you wish to get different ounput, use this formula in cell B1:
=QUERY(QUERY(TRANSPOSE(SPLIT(JOIN(", ",A1:A20),", ",0)),"offset 2"),"skipping 4")
in this formula:
change 2 in offset 2 to 0, 1, 2, 3 to get name, position, company, link
in skipping 4 4 is a number of items.
Number of items can be counted by formula:
=len(A1)-len(SUBSTITUTE(A1,",",""))+1
and final formula is:
=QUERY(QUERY(TRANSPOSE(SPLIT(JOIN(", ",A1:A20),", ",0)),"offset 2"),
"skipping "&len(A1)-len(SUBSTITUTE(A1,",",""))+1)

How to assign a unique ID to a google form input?

Google Forms - I have set up a google form and I want to assign a unique id each of the completed incoming form inputs. My intention is to use the unique ID as an input for another google form I have created which I will use to link the two completed forms. Is there another easier way to do this?
I'm not a programmer but I have programming resources available to me if needed.
I was also banging my head at this and finally found a solution.
I compose a 6-digit number that gets generated automatically for every row and is composed of:
3 digits of the row number - that gives the uniqueness (you can use more if you expect more than 998 responses), concatenated with
3 digits of the timestamp converted to a number - that prevents guessing the number
Follow these instructions:
Create an additional column in the spreadsheet linked to your form, let's call it: "unique ID"
Row number 1 should be populated with column titles automatically
In row number 2, under column "Unique ID", add the following formula:
=arrayformula( if( len(A2:A), "" & text(row(A2:A) - row(A2) + 2, "000") & RIGHT(VALUE(A2:A), 3), iferror(1/0) ) )
Note: An array formula applies automatically to the entire column.
Make sure you never delete that row, even if you clear up all the results from the form
Once a new submission is populated, its "Unique ID" will appear automatically
Formula explanation:
Column A should normally hold the timestamp. If the timestamp is not empty, then this gives the row number: row(A2:A) - row(A2) + 2
Using text I trim it to a 3-digit number.
Then I concatenate it with the timestamp converted to a number using VALUE and trim it to the three right-most digits using RIGHT
Voila! A number that is both unique and hard-to-guess (as the submitter has no access to the timestamp).
If you would like more confidence, obviously you could use more digits for each of the parts.
You can apply unique ID numbers using an arrayformula next to the form data. In row 1 of the first rightmost empty column you can use something like
=arrayformula(if(row(A1:A)=1,"UNIQUE ID",if(len(A1:A)>0,98+row(A1:A),iferror(1/0))).
A few comments regarding the explanation provided by #Ying, which I will try to expand, as it is very good.
> Column A should normally hold the timestamp.
In my case, it is date+time stamp.
> 4. Make sure you never delete that row,
even if you clear up all the results from the form
That issue can easily be avoided by placing the formula in the header like this
={"calculated_id";arrayformula( if( len(C2:C); "" & text(row(C2:C) - row(C2) + 2; "000") & RIGHT(VALUE(C2:C); 3); iferror(1/0) ) )}
This formula provides an string for one cell, and a formula for the next one, which happens to be an array formula which will cover all the cells below.
Note: Depending on your language settings you may need to use ";" or "," as separator among parameters.
> 5. Once a new submission is populated,
its "Unique ID" will appear automatically
Issue
And here is the issue I see with this solution.
If the Google Form allows responders to Edit their responses, the date+time stamp will change and so the calculated_id.
A workaround is to have 2 columns, one is the calculated_id and the other will be static_id.
static_id will take whatever is on calculated_id only if itself has no data, otherwise it will stay as it is.
Doing that we will have an ID that will not change no matter how many updates the response experience.
The sort formula for static_id is
=IF(AND(IFERROR(K2)<>0;K2<>"");K2;L2)
The large one is
={"static_id";ArrayFormula(IF(AND(IFERROR(M2:M)<>0;M2:M<>"");M2:M;L2:L))
}
M or K -> static_id
L -> calculated_id
Remember to put this last one on the header of the column. I tend to change the color to purple when it has a formula behind, so I don't mess with it by mistake.
Extra info.
The numeric value from the date/time stamp differs when it comes from both or just one. Here are some examples.
Note that the number of digits on the fractional part differ quite a lot depending on the case.

Resources