KDB How to join tables with different column names - join

If I have the following tables:
t1:([] c1: 1 2 3; c2: 120 234 876)
t2:([] cd1:1 2; d: 999 899)
How can I join tables where t1.c1 = t2.cd2, where c1 and cd2 are not linked columns?

You're looking to use a left join lj as follows:
q)t1: ([] c1: 1 2 3; c2: 120 234 876)
q)t2:([] cd1:1 2; d: 999 899)
q)t1 lj 1!`c1 xcol t2
c1 c2 d
----------
1 120 999
2 234 899
3 876
where we use xcol to rename the column cd1 in t2 to match c1 in t1.
You can read more on joins at https://code.kx.com/q/ref/joins/

Related

Find column number of last match in a row in sheets

In this table it's easy to find that column E is the first match for the value 3.
How do I find the column of the last match of 3 which will be column I
A B C D E F G H I J K L
6 6 9 9 3 3 2 2 3 1 1 1
Use this formula
=ArrayFormula(Substitute(Address(1,MAX(IF(REGEXMATCH(A1:L1,3&"")<>TRUE,,COLUMN(A1:L1))),4),"1",""))
try:
=SUBSTITUTE(ADDRESS(2, XMATCH(3, A2:P2,, -1), 4), 2, )
=ADDRESS(2, XMATCH(3, A2:P2,, -1), 4)
=XLOOKUP(3, A2:P2, A1:P1,,, -1)
XMATCH has a reverse search feature. Set search_mode top -1 to activate:
=INDEX(1:1,XMATCH(3,2:2,,-1))
(A1)A
B
C
D
E
F
G
H
I
J
K
L
6
6
9
9
3
3
2
2
3
1
1
1
Result:
I

Google Sheets QUERY with dropdown menu + multiple conditions

I have this kind of table on a tab (called "Log"):
A B C D E F G H
a1 b1 c1 d1 5 f1 g1 h1
a2 b1 c2 d1 3 f2 g2
a3 b2 c1 d2 4 f3 g3 h2
a4 b1 d1 5 f4 g4
a5 b2 c3 d1 3 f5 g5 h3
On another tab (called "Watch") of the same file I have a dropdwon menu with all the "D"s.
I'm trying, on the "Watch" tab, with the QUERY function to visualize C,E,G and H. C,E and H have to always be visualized while I need G only if "E"s are between "1" and "3".
The closest I got was this:
=QUERY(Log!B:H, "SELECT C,E,H,G WHERE D='"&B1&"' and H is not null and E<=3")
but it shows only where "E"s are <=3 ignoring the choice from the dropdown menu (WHERE D='"&B1&"')
try:
=QUERY(Log!B:H,
"select C,E,H,G
where lower(D) = '"&TRIM(LOWER(B1))&"'
and H is not null
and E<=3", 0)
Try with this:
=filter({Log!C1:C5,Log!E1:E5,Log!H1:H5,arrayformula(if((Log!H1:H5="")*(Log!E1:E5<=3),Log!G1:G5,""))}, Log!D1:D5="d1")
or
=filter({Log!C1:C5,Log!E1:E5,Log!H1:H5,arrayformula(if((Log!H1:H5="")*(Log!E1:E5<=3),Log!G1:G5,""))}, Log!D1:D5 = B1)
or
=filter({Log!C1:C5,Log!E1:E5, Log!H1:H5,if((Log!H1:H5="")*(Log!E1:E5<=3)=1,Log!G1:G5,"")},Log!D1:D5="d1")
The Result:
c1 5 h1
c2 3 g2
5
c3 3 h3

PSQL: ROW_NUMBER incremented continuously

Hi have the following tables T1:
field1 | field3
--------+--------
A1 | foo
A2 | v1
A3 | v2
A4 | bar
and T2:
field2 | field3
--------+--------
B1 | foo
B2 | bar
If I do the following request:
SELECT DISTINCT ON (T2.field2, T2.field3)
T2.field2 AS F2,
T2.field3 AS F3,
ROW_NUMBER () OVER (ORDER BY T2.field3) AS F4
FROM T2
JOIN T1 ON T2.field3=T1.field3
... I get the following result :
F2:B1, F3:foo, F4:1
F2:B2, F3:bar, F4:4 // I would like F4:2
But I would like F4 to be incremented one by one... I think it is because of the join with T1 but I don’t know how to isolate the ROW_NUMBER...
works
SELECT DISTINCT ON (T2.field2, T2.field3)
T2.field2 AS F2,
T2.field3 AS F3,
ROW_NUMBER () OVER (ORDER BY T2.field3 DESC) AS F4
FROM T2
JOIN T1 ON T2.field3=T1.field3;
also works
AND
DENSE_RANK() won't double count
on line 302 here, I go over why I use DENSE_RANK instead of ROW_NUMBER or RANK https://github.com/pavankat/fantasy-football/blob/master/db/queries.sql
SELECT DISTINCT ON (T2.field2, T2.field3)
T2.field2 AS F2,
T2.field3 AS F3,
DENSE_RANK() OVER (ORDER BY T2.field3 DESC) AS F4
FROM T2
JOIN T1 ON T2.field3=T1.field3;

How to list most frequent text values within a range?

I'm an intermediate excel user trying to solve an issue that feels a little over my head. Basically, I'm working with a spreadsheet which contains a number of orders associated with customer account #s and which have up to 5 metadata "tags" associated with them. I want to be use that customer account # to pull the 5 most commonly occurring metadata tags in order.
Here is a mock up of the first set of data
Account Number Order Number Metadata
5043 1 A B C D
4350 2 B D
4350 3 B C
5043 4 A D
5043 5 C D
1204 6 A B
5043 7 A D
1204 8 D B
4350 9 B D
5043 10 A C D
and the end result I'm trying to create
Account Number Most Common Tag 2nd 3rd 4th 5th
5043 A C B N/A
4350 B D C N/A N/A
1204 B A C N/A N/A
I was trying to work with the formula suggested here:
=ARRAYFORMULA(INDEX(A1:A7,MATCH(MAX(COUNTIF(A1:A7,A1:A7)),COUNTIF(A1:A7,A1:A7),0)))
But I don't know how to a) use the customer account # as a precondition for counting the text values within the range. b) how to circumvent the fact that the Match forumula only wants to work with a single column of data and c) how to read the 2nd, 3rd, 4th, and 5th most common values from this range.
The way I'm formatting this data isn't set in stone. I suspect the way I'm organizing this information is holding me back from simpler solutions, so any suggestions on re-thinking my organization would be just as helpful as insights on how to create a formula to do this.
Implementing this kind of frequency analysis using built-in functions is likely to be a frustrating exercise. Since you are working with Google Sheets, take advantage of the custom functions, written in JavaScript and placed into a script bound to the sheet (Tools > Script Editor).
The function I wrote for this purpose is below. Entering something like =tagfrequency(A2:G100) in the sheet will produce desired output:
+----------------+-----------------+-----+-----+-----+-----+
| Account Number | Most Common Tag | 2nd | 3rd | 4th | 5th |
| 5043 | D | A | C | B | N/A |
| 4350 | B | D | C | N/A | N/A |
| 1204 | B | A | D | N/A | N/A |
+----------------+-----------------+-----+-----+-----+-----+
Custom function
function tagFrequency(arr) {
var dict = {}; // the object in which to store tag counts
for (var i = 0; i < arr.length; i++) {
var acct = arr[i][0];
if (acct == '') {
continue; // ignore empty rows
}
if (!dict[acct]) {
dict[acct] = {}; // new account number
}
for (var j = 2; j < arr[i].length; j++) {
var tag = arr[i][j];
if (tag) {
if (!dict[acct][tag]) {
dict[acct][tag] = 0; // new tag
}
dict[acct][tag]++; // increment tag count
}
}
}
// end of recording, begin sorting and output
var output = [['Account Number', 'Most Common Tag', '2nd', '3rd', '4th', '5th']];
for (acct in dict) {
var tags = dict[acct];
var row = [acct].concat(Object.keys(tags).sort(function (a,b) {
return (tags[a] < tags[b] ? 1 : (tags[a] > tags[b] ? -1 : (a > b ? 1 : -1)));
})); // sorting by tag count, then tag name
while (row.length < 6) {
row.push('N/A'); // add N/A if needed
}
output.push(row); // add row to output
}
return output;
}
You also could get this report:
Account Number Tag count
1204 B 2
1204 A 1
1204 D 1
4350 B 3
4350 D 2
4350 C 1
5043 D 5
5043 A 4
5043 C 3
5043 B 1
with the formula:
=QUERY(
{TRANSPOSE(SPLIT(JOIN("",ArrayFormula(REPT(FILTER(A2:A,A2:A<>"")&",",5))),",")),
TRANSPOSE(SPLIT(ArrayFormula(CONCATENATE(FILTER(C2:G,A2:A<>"")&" ,")),",")),
TRANSPOSE(SPLIT(rept("1,",counta(A2:A)*5),","))
},
"select Col1, Col2, Count(Col3) where Col2 <>' ' group by Col1, Col2
order by Col1, Count(Col3) desc label Col1 'Account Number', Col2 'Tag'")
The formula will count the number of occurrences of any tag.

How to get a single value from a cell-range by matching multiple columns and rows

I'm struggling with this one.
Here is data from 'sheet1':
|| A B C D E
=========================================
1 || C1 C2 X1 X2 X3
.........................................
2 || a b 1 2 3
3 || a d 10 11 12
4 || c d 4 5 6
5 || c f 13 14 15
6 || e f 7 8 9
7 || e b 16 17 18
Here's data in "sheet2":
|| A B C D
=================================
1 || C1 C2 C3 | val
.................................
2 || a d X2 | ?
3 || c f X1 | ?
4 || e b X3 | ?
Note that column C in sheet2 actually has values equal to user column names in sheet1.
I simply want to match A, B and C in sheet2 with A, B and 1 in sheet1 to find values in the last column:
|| A B C D
=================================
1 || C1 C2 C3 | val
.................................
2 || a d X2 | 11
3 || c f X1 | 13
4 || e b X3 | 18
I've been playing with OFFSET() and MATCH() but can't seem to lock down on one cell using multiple search criteria. Can someone help please?
I would use this function in sheet2 D2 field:
=index(filter(sheet1!C:E,sheet1!A:A=A2,sheet1!B:B=B2),1,match(C2,sheet1!$C$1:$E$1,0))
Explanation:
There is a FILTER function which will result the X1,X2,X3 values (C,D,E columns of sheet1) of the row which matches to the these two conditions:
C1 is "a"
C2 is "d"
So it will give back an array: [10,11,12] - which is the values of the X1, X2, X3 (C,D,E ) columns of sheet1 in the appropriate row.
Then, the INDEX function will grab this array. Now we only need to determine which value to pick. The MATCH function will do this computation as it tries to find the third condition C3 (which is in this case "X2) in the header row of sheet1. And in this example it will give back "2" as X2 is in the 2nd position of sheet1!c1:e1
So the INDEX function will give back the 2nd element of this array:[10,11,12], which is 11, the desired value.
Hope this helps.

Resources