Joining rows with columns in SAS - join

I have 2 tables.
1 table with all possible mistakes, looks like
mistake|description
m1 | a
m2 | b
m3 | c
second table is my data:
n | m1 | m2 | m3
1 | 1 | 0 | 1
2 | 0 | 1 | 1
3 | 1 | 1 | 0
where n is row_num, and for each m I put 1 with mistake, 0 - without.
In total I want to join them showing row_nums (or other info) for each mistake.
Something like:
mistake | n
m1 |1
m1 |3
m2 |2
m2 |3
m3 |1
m3 |2

It looks to me like you are just asking to transpose the data.
data have;
input n m1 m2 m3 ;
cards;
1 1 0 1
2 0 1 1
3 1 1 0
;
proc transpose data=have out=want ;
by n ;
var m1 m2 m3 ;
run;

Related

Google SpreadSheet Query - Merge queries results into one

Let's take this data in a Google sheet:
| Product | Green | Red | Date |
| A | 1 | 0 | 1/1/2020 |
| A | 1 | 0 | 2/1/2020 |
| B | 0 | 1 | 2/25/2020 |
| C | 1 | 0 | 2/28/2020 |
| A | 0 | 1 | 3/1/2020 |
My goal would be to display the sum of Green / Red for each product:
From the beginning of the year,
For the current month.
I created this Google Query to get the results for all the year:
=QUERY(DATA!A:D,"select A, sum(B), sum(C) where D >= date '2020-01-01' and D <= date '2020-12-31' group by A")
I get this result:
| Product | sum Green | sum Red |
| A | 2 | 1 |
| B | 0 | 1 |
| C | 1 | 0 |
And this query for the given month (I simplified the query, but I have a Settings sheet to specify the month to query):
=QUERY(DATA!A:D,"select A, sum(B), sum(C) where D >= date '2020-01-01' and D <= date '2020-01-31' group by A")
And get this result:
| Product | sum Green | sum Red |
| A | 1 | 0 |
Now I'm stuck in joining the two results into one, like this:
| Product | Year sum Green | Year sum Red | Jan sum Green | Jan sum Red |
| A | 2 | 1 | 1 | 0 |
| B | 0 | 1 | | |
| C | 1 | 0 | | |
How can I achieve this ?
Thanks a lot for your help!
try:
=ARRAYFORMULA(IFNA(VLOOKUP(F2:F, QUERY(DATA!A:D,
"select A,sum(B),sum(C)
where month(D)+1 = 1
group by A
label sum(B)'Jan sum Green',sum(C)'Jan sum Red'"), {2,3}, 0)))

Maximum of column 1 where value of column 2 matches some condition

Let's say I have the following in a table :
A | B | desired_output
----------------------------
1 | 10 | 1 | 0
2 | 20 | 7 | 0
3 | 30 | 3 | 0
4 | 20 | 2 | 0
5 | 30 | 5 | 1
I'd like to find a formula for each of the cells in the desired_output column which looks at the max of B1:B5 but only for rows for which A = max(A1:A5)
If that's not clear, I'll try to put it another way :
for all the rows in A1:A5 that are equal to max(A1:A5) // so that's rows 3 and 5
find the one which has the max value on B // so between B3 and B5, that's B5
output 1 for this one, 0 for the other
I'd say there would be a where somewhere if such a function existed, something like = if(B=(max(B1:B5) where A = max(A1:A5)), 1, 0) but I can't find how to do it...
I can do it in two columns with a trick :
A | B | C | D
----------------------------
1 | 10 | 1 | | 0
2 | 20 | 7 | | 0
3 | 30 | 3 | 3 | 0
4 | 20 | 2 | | 0
5 | 30 | 5 | 5 | 1
With Cn = if(An=max(A$1:A$5),Bn,"") and Dn = if(Cn = max(C$1:C$5), 1, 0)
But I still can't find how to do it in one column
For systems without MAXIFS, put this in C1 and fill down.
=--(B1=MAX(INDEX(B$1:B$5-(A$1:A$5<>MAX(A$1:A$5))*1E+99, , )))
=ARRAYFORMULA(IF(LEN(A1:A), IF(IFERROR(VLOOKUP(CONCAT(A1:A&"×", B1:B),
JOIN("×", QUERY(A1:B, "order by A desc, B desc limit 1")), 1, 0), )<>"", 1, 0), ))
or shorter:
=ARRAYFORMULA(IF(A:A<>"",N(A:A&"×"&B:B=JOIN("×",SORTN(A:B,1,,1,0,2,0))),))
=ARRAYFORMULA(IF(A:A<>"",N(A:A&B:B=JOIN(,SORTN(A:B,1,,1,0,2,0))),))
How about the following:
=--AND(A5=MAX($A$1:$A$5),B5=MAXIFS($B$1:$B$5,$A$1:$A$5,MAX($A$1:$A$5)))

Compare each cell in two rows, but with an exception

I have a table that keeps track of scores from a test. It compares the row with someone's answers to the row with the correct data:
A B C D E
+--------------+-----+-----+-----+-------+
1 | | Q1 | Q2 | Q3 | Score |
+--------------+-----+-----+-----+-------+
2 | Answers | C | B | A | |
+--------------+-----+-----+-----+-------+
3 | George | C | A | B | 1 |
4 | Judith | C | C | A | 2 |
5 | James | A | B | C | 1 |
+--------------+-----+-----+-----+-------+
The formula behind the Score column is:
=arrayformula(sumproduct(($B$2:$D$2=B3:D3)))
The first part of sumproduct is a static reference to the Answers row. The second part is comparing it against the row it's on. However I want to add an exception: if the Answers row contains an asterisk it should consider all answers correct:
A B C D E
+--------------+-----+-----+-----+-------+
1 | | Q1 | Q2 | Q3 | Score |
+--------------+-----+-----+-----+-------+
2 | Answers | C | * | A | |
+--------------+-----+-----+-----+-------+
3 | George | C | A | B | 2 |
4 | Judith | C | C | A | 3 |
5 | James | A | B | C | 1 |
+--------------+-----+-----+-----+-------+
How would I be able to do this?
Please try:
=arrayformula(sumproduct(($B$2:$D$2=B3:D3)+($B$2:$D$2="*")))
=IF(OR($B$2=B3, $B$2="*"), 1, )+
IF(OR($C$2=C3, $C$2="*"), 1, )+
IF(OR($D$2=D3, $D$2="*"), 1, )
this will cover up to 51 questions (columns / range of B:AZ)
=IF(LEN($B$2),IF(OR($B$2=B3,$B$2="*"),1,),)+
IF(LEN($C$2),IF(OR($C$2=C3,$C$2="*"),1,),)+
IF(LEN($D$2),IF(OR($D$2=D3,$D$2="*"),1,),)+
IF(LEN($E$2),IF(OR($E$2=E3,$E$2="*"),1,),)+
IF(LEN($F$2),IF(OR($F$2=F3,$F$2="*"),1,),)+
IF(LEN($G$2),IF(OR($G$2=G3,$G$2="*"),1,),)+
IF(LEN($H$2),IF(OR($H$2=H3,$H$2="*"),1,),)+
IF(LEN($I$2),IF(OR($I$2=I3,$I$2="*"),1,),)+
IF(LEN($J$2),IF(OR($J$2=J3,$J$2="*"),1,),)+
IF(LEN($K$2),IF(OR($K$2=K3,$K$2="*"),1,),)+
IF(LEN($L$2),IF(OR($L$2=L3,$L$2="*"),1,),)+
IF(LEN($M$2),IF(OR($M$2=M3,$M$2="*"),1,),)+
IF(LEN($N$2),IF(OR($N$2=N3,$N$2="*"),1,),)+
IF(LEN($O$2),IF(OR($O$2=O3,$O$2="*"),1,),)+
IF(LEN($P$2),IF(OR($P$2=P3,$P$2="*"),1,),)+
IF(LEN($Q$2),IF(OR($Q$2=Q3,$Q$2="*"),1,),)+
IF(LEN($R$2),IF(OR($R$2=R3,$R$2="*"),1,),)+
IF(LEN($S$2),IF(OR($S$2=S3,$S$2="*"),1,),)+
IF(LEN($T$2),IF(OR($T$2=T3,$T$2="*"),1,),)+
IF(LEN($U$2),IF(OR($U$2=U3,$U$2="*"),1,),)+
IF(LEN($V$2),IF(OR($V$2=V3,$V$2="*"),1,),)+
IF(LEN($W$2),IF(OR($W$2=W3,$W$2="*"),1,),)+
IF(LEN($X$2),IF(OR($X$2=X3,$X$2="*"),1,),)+
IF(LEN($Y$2),IF(OR($Y$2=Y3,$Y$2="*"),1,),)+
IF(LEN($Z$2),IF(OR($Z$2=Z3,$Z$2="*"),1,),)+
IF(LEN($AA$2),IF(OR($AA$2=AA3,$AA$2="*"),1,),)+
IF(LEN($AB$2),IF(OR($AB$2=AB3,$AB$2="*"),1,),)+
IF(LEN($AC$2),IF(OR($AC$2=AC3,$AC$2="*"),1,),)+
IF(LEN($AD$2),IF(OR($AD$2=AD3,$AD$2="*"),1,),)+
IF(LEN($AE$2),IF(OR($AE$2=AE3,$AE$2="*"),1,),)+
IF(LEN($AF$2),IF(OR($AF$2=AF3,$AF$2="*"),1,),)+
IF(LEN($AG$2),IF(OR($AG$2=AG3,$AG$2="*"),1,),)+
IF(LEN($AH$2),IF(OR($AH$2=AH3,$AH$2="*"),1,),)+
IF(LEN($AI$2),IF(OR($AI$2=AI3,$AI$2="*"),1,),)+
IF(LEN($AJ$2),IF(OR($AJ$2=AJ3,$AJ$2="*"),1,),)+
IF(LEN($AK$2),IF(OR($AK$2=AK3,$AK$2="*"),1,),)+
IF(LEN($AL$2),IF(OR($AL$2=AL3,$AL$2="*"),1,),)+
IF(LEN($AM$2),IF(OR($AM$2=AM3,$AM$2="*"),1,),)+
IF(LEN($AN$2),IF(OR($AN$2=AN3,$AN$2="*"),1,),)+
IF(LEN($AO$2),IF(OR($AO$2=AO3,$AO$2="*"),1,),)+
IF(LEN($AP$2),IF(OR($AP$2=AP3,$AP$2="*"),1,),)+
IF(LEN($AQ$2),IF(OR($AQ$2=AQ3,$AQ$2="*"),1,),)+
IF(LEN($AR$2),IF(OR($AR$2=AR3,$AR$2="*"),1,),)+
IF(LEN($AS$2),IF(OR($AS$2=AS3,$AS$2="*"),1,),)+
IF(LEN($AT$2),IF(OR($AT$2=AT3,$AT$2="*"),1,),)+
IF(LEN($AU$2),IF(OR($AU$2=AU3,$AU$2="*"),1,),)+
IF(LEN($AV$2),IF(OR($AV$2=AV3,$AV$2="*"),1,),)+
IF(LEN($AW$2),IF(OR($AW$2=AW3,$AW$2="*"),1,),)+
IF(LEN($AX$2),IF(OR($AX$2=AX3,$AX$2="*"),1,),)+
IF(LEN($AY$2),IF(OR($AY$2=AY3,$AY$2="*"),1,),)+
IF(LEN($AZ$2),IF(OR($AZ$2=AZ3,$AZ$2="*"),1,),)
and here is "Formula Generator" sheet for that

Neo4j Divide ( / ) by Zero ( 0 )

In neo4j I am querying
MATCH (n)-[t:x{x:"1a"}]->()
WHERE n.a > 1 OR n.b > 1 AND toFloat(n.a) / (n.a+n.b) * 100 < 90
RETURN DISTINCT n, toFloat(n.a) / (n.a + n.b) * 100
ORDER BY toFloat(n.a) / (n.a + n.b) * 100 DESC
LIMIT 10
but I got / by zero error.
Since I declared one of n.a or n.b should be 1, if both zero it should skip that row and I shouldn't get this error. This looks like a logic issue in Neo4j. There is no problem when I delete AND toFloat(n.a)/(n.a+n.b)*100 < 90 from WHERE clause. But I want the results only lower than 90. How can I overcome this?
Can either of n.a or n.b be negative? I was able to reproduce this with:
WITH -2 AS na, 2 AS nb
WHERE (na > 1 OR nb > 1) AND toFloat(na)/(na+nb)*100 < 90
RETURN na, nb
And I get: / by zero
Perhaps try changing your WHERE clause to:
WITH -2 AS na, 2 AS nb
WHERE (na + nb > 0) AND toFloat(na)/(na+nb)*100 < 90
RETURN na, nb
And I get: zero rows.
It seems the second condition, toFloat(na) / (na + nb) * 100 < 90, is tested before the first. Look at the Filter(1) operator in this execution plan:
+--------------+---------------+------+--------+--------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+--------------+---------------+------+--------+--------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection | 1 | 3 | 0 | anon[111], anon[138], n, toFloat(n.a)/(n.a + n.b)* 100 | anon[111]; anon[138] |
| Top | 1 | 3 | 0 | anon[111], anon[138] | { AUTOINT6}; |
| Distinct | 0 | 3 | 24 | anon[111], anon[138] | anon[111], anon[138] |
| Filter(0) | 0 | 3 | 6 | anon[29], n, t | t.x == { AUTOSTRING0} |
| Expand(All) | 1 | 3 | 6 | anon[29], n, t | ( n#7)-[t:x]->() |
| Filter(1) | 1 | 3 | 34 | n | (Ors(List(n#7.a > { AUTOINT1}, Multiply(Divide(ToFloatFunction( n#7.a),Add( n#7.a, n#7.b)),{ AUTOINT3}) < { AUTOINT4})) AND Ors(List( n#7.a > { AUTOINT1}, n.b > { AUTOINT2}))) |
| AllNodesScan | 4 | 4 | 5 | n | |
+--------------+---------------+------+--------+--------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
You can get around this by force breaking the filter into two clauses.
MATCH (n)-[t:x { x:"1a" }]->()
WHERE n.a > 1 OR n.b > 1
WITH n
WHERE toFloat(n.a) / (n.a + n.b) * 100 < 90
RETURN DISTINCT n, toFloat(n.a) / (n.a + n.b) * 100
ORDER BY toFloat(n.a) / (n.a + n.b) * 100 DESC
LIMIT 10
I found this behavior surprising, but as I think about it I suppose it isn't wrong for the execution engine to rearrange the filter in this way. There may be the assumption that the condition will abandon early on failing the first declared condition, but Cypher is exactly that: declarative. So we express the "what", not the "how", and in terms of the "what" A and B is equivalent to B and A.
Here is the query and a sample graph, you can check if it translates to your actual data:
http://console.neo4j.org/r/f6kxi5

Counting unique values in Google Spreadsheet based on multiple columns

Let's take this spreadsheet for example:
ID | StoreName | StoreID | CheckinTime | User
0 | w1 | 1 | 10:00 | user1
1 | w5 | 1 | 10:01 | user2
2 | w2 | 1 | 10:01 | user1
3 | w1 | 1 | 10:01 | user4
4 | w5 | 1 | 10:05 | user1
5 | w3 | 1 | 10:05 | user6
6 | w1 | 1 | 10:05 | user1
7 | w1 | 1 | 10:05 | user1
Is there a way to create a new column/tab/sheet to count all the unique checkins for a store. So let's say; StoreName "w1" is visited by "user1" 3 times and 1 time by "user4". The expected output will be 2 (2 unique visitors for "w1"). This is the output I would like to have:
ID | StoreName | uniqueCheckins
0 | w1 | 2
1 | w2 | 1
2 | w3 | 1
3 | w4 | 0
4 | w5 | 2
To produce the StoreName and uniqueCheckins output columns:
=QUERY(QUERY(B:E,"select B, E, count(C) group by B, E",1),"select Col1, count(Col2) group by Col1 label count(Col2) 'uniqueCheckins'",1)
However this will omit any StoreName that doesn't appear in the raw data (in your example, w4). Would this be OK?
Updated my answer
=({FILTER(SORT(UNIQUE(Index(UNIQUE({B2:B,E2:E}),,1))),SORT(UNIQUE(Index(UNIQUE({B2:B,E2:E}),,1)))<>""),ARRAYFORMULA(COUNTIF(INDEX(UNIQUE({B2:B,E2:E}),,1),"="&FILTER(SORT(UNIQUE(Index(UNIQUE({B2:B,E2:E}),,1))),SORT(UNIQUE(Index(UNIQUE({B2:B,E2:E}),,1)))<>"")))})

Resources