Compare rows between two sheets - Function - google-sheets

I think I’ve ‘over thought’ my problem so I’m starting from the beginning again.
I have two spreadsheets, one is an original database (org_DB) and the second is the updated database (new_DB). The number of rows is around 15,000 for org_DB and 18,000 for new_DB. The number of relevant contiguous columns is exactly 14 in both.
I need a third sheet, the results, that contains only the new_DB entries that are DIFFERENT from the original database, and, all new_DB entries that are ADDITIONAL.
The definition of DIFFERENT is a row with greater than 0 differences.
The definition of ADDITIONAL is a row that has no equivalence.
I think I need the two definitions because ‘new DB’ is longer than ‘org DB’ and all my formulas fail at the end point of org_DB.
The two test sized DB are:
org_DB
Code 1 Code 2 Code 3 Code 4
AA00001 AAGA 1180218 24
AA00007 AAGA 03821787-97 58
AA00008 AAGA 11821260-99 59
AA00009 AAGA 11001017 60
AA00016 AAGA 3801648 67
AA00017 AAGA 3801649 120
AA00018 AAGA 3801692 66
AA00019 AAGA 03821084-61 70
new_DB
Code 1 Code 2 Code 3 Code 4
AA00001 AAGA 1180218 24
AA00008 AAGA 11821260-99 59
AA00009 AAGA 11001015 60
AA00016 AAGA 3801648 67
AA00017 AAGA 3801649 120
AA00018 AAGA 3801692 69
AA00019 AAGA 03821084-61 70
XX00101 XXGA 1234X567X 101
XX00102 XXGB 1234X567X 101
Result DB (the result I am looking for)
AA00009 AAGA 11001015 60
AA00018 AAGA 3801692 69
XX00101 XXGA 1234X567X 101
XX00102 XXGB 1234X567X 101
For row comparison (which works on a row by row basis) I’m using
=if(ArrayFormula(sum(--(new_DB!A2:D2=org_DB!A2:D2)))<>4,"Copy row","Ignore")
To get the result array I’m using
=filter(new_DB,if(ArrayFormula(sum(--(new_DB=org_DB)))<>4)
Problem 1 is that the FILTER condition argument only gets a single formula result from the ArrayFormula so fails with an #N/A - “FILTER has mismatched range sizes. Expected row count: nn, column count: 1. Actual row count: 1, column count: 1.”
Problem 2 is that the ArrayFormula after IF is comparing 1 row and nn columns - which I want. Wrapping the whole function in another ArrayFormula gives even stranger results.
Problem 3. Changing the row comparison function from IF ArrayFormula to SUMPRODUCT produces the wrong result when used in a ArrayFormula wrapper.
I can see that if that if I use this method then the recursion process is likely to be very lengthy - so I've come to accept my method is fundamentally flawed. Should I use a VLOOKUP and FILTER combo for column A. Column A is actually a SKU ID so should always be unique.
Can anyone help please. TIA
Note that org_DB row 3 (AA0007...) is not in the results. Deliberate.
Test sheet here: Test DB Sheet

This will look really ugly really quickly with a lot of columns which is why I'm asking of you have any columns you can limit yourself to.
=ARRAYFORMULA(FILTER(
'New DB'!A2:D10,
ISERROR(MATCH('New DB'!A2:A10 & "|" &
'New DB'!B2:B10 & "|" &
'New DB'!C2:C10 & "|" &
'New DB'!D2:D10,
'Org DB'!$A$2:$A$9 & "|" &
'Org DB'!$B$2:$B$9 & "|" &
'Org DB'!$C$2:$C$9 & "|" &
'Org DB'!$D$2:$D$9,
0))))
This filters the content of New DB by whether the concatenated columns of New DB can be found in Org DB, you can use any other Delimiter if you have | in there.

Related

How to find the row of a specified value than return a corresponding value on a different column Google Sheets

I'm trying to write a formula in Google Sheets which can first locate the row of a specific value. Then index to the value contained on that row a few columns over.
Let's assume the following
A B C
1 12 80
2 43 35
3 64 15
4 13 56
5 44 93
6 86 48
7 14 31
8 41 3
9 63 56
10 11 46
Values in column B and C have a correlated relationship. I need to first locate a specific value in column B than find it's corresponding value on the same row in column C.
For the sake of example, let's assume I'm trying to locate the row containing the value 41 in column B. And then would like to return the corresponding value in column C, which in this case would be 3.
The reason why I need a formula like this is because the data I'm using is highly variable and large. Over 4000 rows. It is unknown what rows the values to be found sit on.
You may try either:
=filter(C:C,B:B=D2)
OR
=xlookup(D2,B:B,C:C,)
filter() will output all instances of rows(column C) which has 41 in column B while xlookup will pick just the first match of 41 within the column

Google Sheets Transpose last columns values

I have this table in Google Sheets
Month 1 2 3 ...
1 20 30 45
2 32 47
3 53
...
How do I Transpose the last value of each columns into this?
Month lastValue
1 20
2 32
3 53
...
Although I'm not sure whether I could correctly understand your question, in your situation, how about the following sample formula?
Sample formula:
=BYROW(B2:D,LAMBDA(x,IFERROR(INDEX(SPLIT(TEXTJOIN(",",TRUE,x),","),1))))
In this formula, in order to start no empty cell, I used TEXTJOIN and SPLIT. And, the 1st cell is retrieved. I used this with BYROW.
As another approach, this formula =BYROW(B2:D,LAMBDA(x,IFERROR(INDEX(FILTER(x,x<>""),1)))) might be able to be also used.
Testing:
When this formula is used in your provided situation, the following result is obtained.
References:
TEXTJOIN
SPLIT
BYROW

How can I rearrange a Google Sheet using the data within the sheet?

I get massive serialized spreadsheets in the following format:
PN SN Qty
1 24 3
2 25 1
3 26 7
I need to write a Sheets script that can rearrange the data so that the headers are gone, and the quantities are extrapolated, then cleared.
For example, the desired result would be:
1 24
1 24
1 24
2 25
3 26
3 26
3 26
3 26
3 26
3 26
3 26
I have tried writing a few recursive statements to achieve this, however once I started adding in new rows to the sheet my loop breaks. I've tried hundreds of different iterations of what I know should be a fairly simple task but alas, I am well out of practice. I fear at this point I am fixated on the wrong idea. Any help in the right direction would be greatly appreciated!
You don't need to use google apps script.
Try the following formula:
={ARRAYFORMULA(TRIM(TRANSPOSE(SPLIT(QUERY(
REPT(A2:A&"♠", C2:C), ,999^99), "♠")))),ARRAYFORMULA(TRIM(TRANSPOSE(SPLIT(QUERY(
REPT(B2:B&"♠", C2:C), ,999^99), "♠"))))}
Different approach, same result:
=ArrayFormula(SPLIT(QUERY(FLATTEN(SPLIT(FILTER(REPT(A2:A&"\"&B2:B&"^",C2:C),A2:A<>""),"^",0,1)),"Select * Where Col1 <>''"),"\"))
I'm adding this only because different people may find one or the other easier to understand and apply. There is no practical or performance gain to this formula over the great suggestion given by Marios.
NOTE: This formula makes use of an as yet unofficial Google Scripts function, FLATTEN.

ARRAYFORMULA with repetition

I have two columns of data, and would like to distribute the elements of one of these columns over several rows. I can easily calculate the index of the element I need, but cannot figure out how to access the element.
A B Desired output Formula for index: =ARRAYFORMULA(IF(A:A,CEILING(ROW(A:A)/3+1),""))
1 11 22 2
2 22 22 2
3 33 22 2
4 44 33 3
5 33 3
6 33 3
7 44 4
How can I modify my formula for the index so that it yields the item of column B at the calculated index?
I tried =ARRAYFORMULA(IF(A:A, INDEX(B:B, CEILING(ROW(A:A)/3+1), 1), "")) but that only repeats the first element (22) 7 times.
Use Vlookup instead of Index:
=ARRAYFORMULA(IF(A:A,vlookup(CEILING(ROW(A:A)/3+1),A:B,2),""))
EDIT
It isn't necessary to use a key column, you could use something like this:
=ARRAYFORMULA(vlookup(CEILING(sequence(counta(B:B)*3)/3+1),{row(B:B),B:B},2))
assuming you wanted to generate three rows for each non-blank row in column B not counting the first one.
Or if you want to be different, use a concatenate/split approach:
=ArrayFormula(flatten(split(rept(filter(B:B,B:B<>"",row(B:B)>1)&"|",3),"|")))
(all the above assume you want to ignore the first row in col B and start with 22).

select less than and replace with value in column A

I have a table with a few thousand rows and columns, it looks sort of like this
this:
ID Distance1 Distance2
1 102 101
2 101 100
3 100 99
4 99 98
5 98 97
...
I would like to select all values/distances in columns B and C that are less than 100 and replace them with the value in column A (their ID number).
All distances above 100 I want to delete. The real table has several thousand columns. How can I do this?
I have tried using search and replace, and conditional formatting where I have tried creating new rule using Index + Match but I encounter errors.
Assuming ID is in A1 of Sheet1, Copy the headings row into A1 of a new sheet and in B2 of that sheet:
=IF(AND(Sheet1!B2<100,Sheet1!B2>0),Sheet1!$A2,"")
Copy across and down to suit, Select the new sheet, Copy, Paste Special, Values over the top.
This above treats 100 as more than 100 and assumes no 0 or lesser values.

Resources