joining 2 tables with specific conditions

joining 2 tables with specific conditions - join

I have 2 tables. First is original and contains row_number and several fields.
row_num col1 col2
1 a b
2 c d
3 e f
Second table is areport from first table, which shows type of mistake, col_name and row number:
m_id field row_num
m1 col1 1
m1 col1 3
m2 col2 2
And my task is to join to 2nd table the values of each field in table 1 depending on row_num and column, should be smth like:
m_id field row_num value
m1 col1 1 a
m1 col1 3 e
m2 col2 2 d
I've tryed using varios transposes like using by and id arguments, but i understand that it would be easier to use macro.
proc sort data=a1;
by row_num;
run;
proc transpose data=a1 out=a2;
by row_num;
var m_id;
run;

You could merge the (sorted) sets and use VVALUEX() to programmatically get the (character) value of a variable by name.
Something like:
data combined;
merge first second;
by row_num;
value = vvaluex(field);
keep m_id field row_num value;
run;
This assumes that row_num is unique in the first set.

Related

Aggregating rows with query in google sheets

I have a data set that looks something like this:
Column A
Column B
category 1
Team 1
1.category 1
Team 1
2.category 2
Team 1
category 2
Team 1
category ３
Team 1
３.category ３
Team 1
I am trying to use query function with a pivot statement to calculate the occurrence of each category for team 1 (I have several other teams in the data set, but for simplicity I just wrote out my example with team 1). Unfortunately the naming of the categories are not consistent in the original data, and I cannot change them.
So I need a way to combine the results of the sum of category 1 and 1.category1, and so on.
How could I handle rewrite this to get the type of result as listed below?
Category
Team 1
category 1
2
category 2
2
category ３
2
The formula I have now is as following:
query('sheet1!A:B,"Select A, count(B) where B='Team 1' group by A pivot B label B 'Team 1'",1)

If the category names all have a similar format to those in your example (with extraneous data only at the beginning, followed by 'category N', and you don't care if zero counts per category are left blank then a more compact approach then the previous answer is (for any number of teams/categories):
=arrayformula(query({regexextract(A2:A,"category.+"),B2:B},"select Col1,count(Col1) where Col2 is not null group by Col1 pivot Col2 label Col1 'Category'",0))

formula:
=ArrayFormula(
LAMBDA(DATA,CATEGORY,
LAMBDA(RESULT,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(QUERY(SPLIT(TRANSPOSE(SPLIT(RESULT,"&")),"|"),"SELECT Col1,SUM(Col3) GROUP BY Col1 PIVOT Col2 LABEL Col1'Category'",0))
)(
JOIN("&",
BYROW(CATEGORY,LAMBDA(CAT,
JOIN("&",CAT&"|"&BYROW(TRANSPOSE(QUERY(DATA,"SELECT COUNT(Col1) WHERE lower(Col1) CONTAINS'"&CAT&"' PIVOT Col2",0)),LAMBDA(ROW,JOIN("|",ROW))))
))
)
)
)({ASC($A$2:$B$7)},{"category 1";"category 2";"category 3"})
)
use ASC() to format all numbers-like values into number,
use {} to create the match conditions,
iterate the conditions with BYROW() and...
use QUERY() with CONTAINS to COUNT matches of the given conditions,
use TRANSPOSE() to turn the match results of each row sideway,
change the results into string with JOIN(), this helps to modify the row and column arrangment,
SPLIT() the data to create the correct array format we can use,
use QUERY() to PIVOT the SUM of the COUNT result as our final output.
Another approch works in a slightly different concept:
=ArrayFormula(
LAMBDA(DATA,CAT,
LAMBDA(DATA,
LAMBDA(COLA,COLB,
LAMBDA(COLA,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(TRANSPOSE(QUERY({COLA,COLB},"SELECT Col2,COUNT(Col2) GROUP BY Col2 PIVOT Col1 LABEL Col2'Category'",0)))
)(REGEXEXTRACT(COLA,JOIN("|",CAT)))
)(INDEX(DATA,,1),INDEX(DATA,,2))
)(ASC(DATA))
)($A$2:$B$7,{"category 1","category 2","category 3"})
)
We can modify the Category column of the input data with REGEXEXTRACT() before sending it into query, which in this case, do make the formula looks a bit cleaner.
Inspired by #The God of Biscuits 's answer, we can now get rid of the CAT variable, which makes the formula more elastic to fit into your condition.
This REGEXEXTRACT() will extract Category value from the 1st 'category' match found to the end of the 1st 'number' after it, with any spacing in between the two value.
=ArrayFormula(
LAMBDA(DATA,
LAMBDA(COLA,COLB,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(TRANSPOSE(QUERY({COLA,COLB},"SELECT Col2,COUNT(Col2) WHERE Col2 IS NOT NULL GROUP BY Col2 PIVOT Col1 LABEL Col2'Category'",0)))
)(REGEXEXTRACT(LOWER(INDEX(DATA,,1)),"((?:category)(?: +?)(?:[０-９]|[0-9])+)"),INDEX(DATA,,2))
)($A$2:$B)
)

You can also use filter with a count a like this:
=counta(filter(Sheet1!A:A,(Sheet1!A:A="category 1")+(Sheet1!A:A="1.category 1"),Sheet1!B:B="Team 1"))

How to Query function for multiple counts

I have a Google sheet being used as a property index which has a list of property with its summary details such as location, type, number of bedrooms etc.
I'm trying to count the number of how many 1 bedroom, 2 bedroom, 3 bedroom properties etc there are in each location.
However, I'm not sure how to do multiple counts, for example where column W = 1 and (next column) where W = 2 etc
=QUERY(Property_Location_Type,"select T, count(T) where W='1' Group by T",1)

you could try something alike:
formula in cell Y1 here:
=ARRAYFORMULA(QUERY({T2:T,TO_TEXT(IF(W2:W>2,">2",W2:W))&{"",""}},"Select Col1, Count(Col2) Where Col1!='' group by Col1 PIVOT Col3 label Col1 'Town/City'"))
-

How to get unique values in a column, including cells with multiple values seperated by commas, in Google Sheet?

I have a column in a Google Sheet, which in some cases, includes multiple values separated by commas — like this:
Value
A example
B example
C example
D example
A example, E example
A example, F example
G example, D example, C example
I would like to count all occurrences of the unique values in this column, so the count should look like:
Unique value
Occurrences
A example
3
B example
1
C example
2
D example
2
E example
1
F example
1
G example
1
Currently, however, when I use =UNIQUE(A2:A), the result gives this:
Unique value
Occurrences
A example
1
B example
1
C example
1
D example
1
A example, E example
1
A example, F example
1
G example, D example, C example
1
Is there a way I can count all of the instances of letters, whether they appear in individually in a cell or appear alongside other letters in a cell (comma-seperated)?
(This looks like a useful answer in Python, but I'm trying to do this in Google Sheets)

try:
Formula in C1:
=INDEX(QUERY(IFERROR(FLATTEN(SPLIT(A1:A,", ")),""),"Select Col1, count(Col1) where Col1 is not null group by Col1 label count(Col1) ''"))
Or, as per the comments, split on the combination instead:
=INDEX(QUERY(IFERROR(FLATTEN(SPLIT(A1:A,", ",0)),""),"Select Col1, count(Col1) where Col1 is not null group by Col1 label count(Col1) ''"))
2nd EDIT: To order descending by count use:
=INDEX(QUERY(IFERROR(FLATTEN(SPLIT(A1:A,", ",0)),""),"Select Col1, count(Col1) where Col1 is not null group by Col1 Order By count(Col1) desc label count(Col1) ''"))

Assuming data in A1:A7:
In C1:
=SORT(UNIQUE(FLATTEN(ARRAYFORMULA(SPLIT(A1:A7,", ")))))
In D1:
=ARRAYFORMULA(MMULT(0+ISNUMBER(SEARCH(", "&ColumnCSpilledRange&", ",", "&TRANSPOSE(A1:A7)&", ")),ROW(A1:A7)^0))
Replace ColumnCSpilledRange appropriately.

How can I avoid having to put 0s into the NULL fields to get a correct query calculation in Google Sheets

I have a Google Sheets question, which I have not been able to figure out yet with Google-Fu and RTFM:
Take the following spreadsheet as an example:
https://docs.google.com/spreadsheets/d/1IvMVaUdUDfYOoKyG0Uwd2n0M1mLjOTE5yZQ9K2R3q2M/edit?usp=sharing
In case the sheet gets lost in time, I am going to post its contents here:
Sheet1:
foo
withdrawal
deposit
C
4
10
D
10
E
10
4
As you see here, the withdrawal field for the D value being foo is empty, i.e. null
Sheet2:
foo
balance
C
=INDEX(QUERY({Sheet1!$A$2:C}, "SELECT SUM(Col3) - SUM(Col2) WHERE Col1 = '"&A2&"'"), 2)
D
=INDEX(QUERY({Sheet1!$A$2:C}, "SELECT SUM(Col3) - SUM(Col2) WHERE Col1 = '"&A3&"'"), 2)
E
=INDEX(QUERY({Sheet1!$A$2:C}, "SELECT SUM(Col3) - SUM(Col2) WHERE Col1 = '"&A4&"'"), 2)
The result is
foo
balance
C
6
D
E
-6
As you see, the balance field for the category D is null, although it should be -10.
The fix for that is to put a 0 into the deposit field in Sheet1 explicitly.
In my example, I get that data using a csv-export, and fields are generally empty and not 0, and it is cumbersome to add the 0 there. Is there a way to have something like COALESCE in that sum there (like in SQL)?
Please let me know.

it seems like something quite a bit simpler would avoid the problem:
=SUMPRODUCT(Sheet1!C:C-Sheet1!B:B,Sheet1!A:A=A2)
for cell B2.

Why don't you just add this in cell A1 of Sheet2 instead of all the Query:
=arrayformula({Sheet1!A1,"balance";if(Sheet1!A2:A<>"",{Sheet1!A2:A,Sheet1!C2:C-Sheet1!B2:B},)})
Obviously ensure cells Sheet2!A2:A and Sheet2!B1:B are empty.
If you have duplicate values of foo, try:
=arrayformula(query({Sheet1!A1,"balance";if(Sheet1!A2:A<>"",{Sheet1!A2:A,Sheet1!C2:C-Sheet1!B2:B},)},"select Col1,sum(Col2) where Col1 is not null group by Col1 label sum(Col2) 'balance'",1))
A better option for a single-cell formula, referencing multiple sheets would be:
=arrayformula(query(
{Sheet1!A:A,n(Sheet1!B:C);Sheet2!A2:A,n(Sheet2!B2:C);Sheet3!A2:A,n(Sheet3!B2:C)},
"select Col1,sum(Col3)-sum(Col2) where Col1 is not null group by Col1 label sum(Col3)-sum(Col2) 'balance' ",1))

Google query multiple columns references

I have this query:
=QUERY(all!A:Z, "select B where (B=1 and H=true)")
How can I turn B and H into a column reference so I don't have to write them when I copy the query to new cells.
Note: column B is a number, meanwhile H contains boolean values.

try it like this:
=QUERY({all!A:Z}, "select Col2 where (Col2=1 and Col8=true)")

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

joining 2 tables with specific conditions - join

You could merge the (sorted) sets and use VVALUEX() to programmatically get the (character) value of a variable by name. Something like: data combined; merge first second; by row_num; value = vvaluex(field); keep m_id field row_num value; run; This assumes that row_num is unique in the first set.

Related

Aggregating rows with query in google sheets

How to Query function for multiple counts

How to get unique values in a column, including cells with multiple values seperated by commas, in Google Sheet?

How can I avoid having to put 0s into the NULL fields to get a correct query calculation in Google Sheets

Google query multiple columns references

Categories

Resources