How to only KEEP almost similar records in cognos - cognos-11

You're right!
This is what we are trying to achieve:
For each OrderID, Keep ONLY those OrderIDs where where their ColumnD has at least 2 SAME consecutive values AND where ColumnC has at least 2 same value in it.
So for example we would be keeping the first 2 rows of OrderID 101
So for example we would be keeping the 3 rows of OrderID 104
So for example we would be keeping the 2 rows of OrderID 305
The rest we don't want to see in the report!
Here is an image that might help explain what we want to achieve.
OrderID
ColumnB
ColumnC
ColumnD
101
159
10
18$
101
132
10
18$
101
147
22
18$
102
111
12
55$
103
130
10
18$
104
123
381
75$
104
456
381
75$
104
789
381
75$
305
555
101
37$
305
652
101
37$

Related

How T Transpose Multiple Columns Values by Groups between groups delimiters in adjacent Column Google Sheets?

I have the following minimal example data (in reality 100's of groups) in range A1:P9 (same data in range A14:A22):
With Sample A1:AR9:
2
61
219
2
4
2
:
61
219
26
26
26
94
21
33
4
26
26
26
94
2
2
:
154
26
40
19
3
2
21
33
14
1
2
3
:
87
39
54
38
26
32
38
26
32
87
39
54
38
26
23
23
4
6
28
2
154
26
2
2
40
19
14
87
39
54
38
26
32
38
26
32
87
39
54
38
26
1
23
2
23
4
4
3
6
20
28
Or Sample A14:AQ22:
2
61
219
2
:
61
219
4
:
26
26
26
94
2
:
21
33
4
26
26
26
94
2
:
154
26
2
:
40
19
3
2
21
33
14
:
87
39
54
38
26
32
38
26
32
87
39
54
38
26
1
:
23
2
:
23
4
:
3
6
20
2
154
26
2
2
40
19
14
87
39
54
38
26
32
38
26
32
87
39
54
38
26
1
23
2
23
4
4
3
6
20
28
I need the output as shown in range Q1:AR3 or as in range Q14:AQ16.
Basically, at each group delimited/inbetween values in Column A, I would need:
The intemediary adjacent values in Column B to be transposed horizontally
And the adjacent content of Columns C to P (14 Columns, at least) to be "joined" together horizontaly an sequencialy "per group", including the content of the delimiter's row (in Column A).
As a bonus it would be really nice to have the Transposed data followed by a :, and each sub Content of Columns C to P to be also separated by a | (as shown in screenshot Q1:AR3 or Q14:AR16).
(Or if it's more feasible, alternatively, the simpler to read 2nd model as in A14:AQ22).
I have a really hard time putting together a formula to come to the expected result.
All I could think of was:
Transposing Column B's content by getting the rows of the adjacent Cells with values in column A,
Concatenating with the Column letter,
Duplicating it in a new column, and Filtering out the blank intermediary cells,
Then shifting the duplicated column 1 cell up,
Then concatenating within a TRANSPOSE formula to get the range of the groups,
Then finally transposing all the groups from Columns B in a new Colum
(very convoluted but I couldn't find better way).
To get to that input:
=TRANSPOSE(B1:B3)
=TRANSPOSE(B4:B5)
=TRANSPOSE(B7:B9)
That was already a very manual and error prone process, and still I could not successfully think of how to do the remaining content joining of Column C to P in a formula.
I tested the following approach but it's not working and would be very tedious process to fix to go and to implement on large datasets:
=TRANSPOSE(B1:B3)&": "&JOIN( " | " , FILTER(C1:P1, NOT(C2:P2 = "") ))&JOIN( " | " , FILTER(C2:P2, NOT(C2:P2 = "") ))&JOIN( " | " , FILTER(C43:P3, NOT(C3:P3 = "") ))
=TRANSPOSE(B4:B5)&": "&JOIN( " | " , FILTER(C4:P4, NOT(C4:P4 = "") ))&JOIN( " | " , FILTER(C5:P5, NOT(C5:P5 = "") ))
=TRANSPOSE(B6:B9)&": "&JOIN( " | " , FILTER(C6:P6, NOT(C6:P6 = "") ))&JOIN( " | " , FILTER(C7:P7, NOT(C7:P7 = "") ))&JOIN( " | " , FILTER(C8:P8, NOT(C8:P8 = "") ))&JOIN( " | " , FILTER(C8:P8, NOT(C9:P9 = "") ))
What better approach to favor toward the expected result? Preferably with a Formula, or if not possible with a script.
Any help is greatly appreciated.
For Sample 1 try this out:
=LAMBDA(norm,MAP(UNIQUE(norm),LAMBDA(ζ,{TRANSPOSE(FILTER(B1:B9,norm=ζ)),":",SPLIT(BYROW(TRANSPOSE(FILTER(BYROW(C1:P9,LAMBDA(r,TEXTJOIN("ζ",1,r))),norm=ζ)),LAMBDA(rr,TEXTJOIN("γ|γ",1,rr))),"ζγ")})))(SORT(SCAN(,SORT(A1:A9,ROW(A1:A9),),LAMBDA(a,c,IF(c="",a,c))),ROW(A1:A9),))

If number in range was >= 100 and subsequently <100

Anyone got any ideas on how to do this?
I'm trying to build a spreadsheet that helps me monitor the performance of my blog articles. So if the article historically had >=100 visits at any point but subsequently gets <100 at any point I want to know about it.
The formula I've been playing with is:
=IF(((FILTER(C2:G2,C2:G2<>E2))>=100 AND (FILTER(C2:G2,C2:G2<>E2))<100, "Article Failing", ""))
I'm using Filter btw because I need to exclude column E, which is the delta between this month's & last month's numbers.
I know the formula isn't logically right but struggling to think of a way to do it.
Edit:
Here's a link to the spreadsheet with desired output https://docs.google.com/spreadsheets/d/1TeaQ6oUbJDeKxUi8tvvCWXtw0oK9d5IVO60j1UbQCK8/edit?usp=sharing
Here's a table showing the sample data and desired output:
Total users (last 30 days)
Total users (prev 30 days)
Delta - Total users
Total users last 30-60 days
Total users prev 60-90 days
Delta - Total users
Above 100
Article Failing
651
90
-417
772
249
523
Tweak Article
Failing
610
570
40
550
432
118
Tweak Article
OK
436
409
27
328
210
118
Tweak Article
OK
422
288
134
53
288
-235
Tweak Article
OK
95
476
-90
417
477
-60
Below100
Failing
337
179
158
129
182
-53
Tweak Article
OK
305
395
-90
318
343
-25
Tweak Article
OK
304
348
-44
299
253
46
Tweak Article
OK
302
277
25
283
317
-34
Tweak Article
OK
286
252
34
268
281
-13
Tweak Article
OK
213
193
20
221
168
53
Tweak Article
OK
157
138
19
132
166
-34
Tweak Article
OK
150
157
-7
110
68
42
Tweak Article
OK
I've made cells B2 & A6 be failing articles i.e. they were >=100 but have since gone below 100. The end column 'Article Failing' is where I'm trying to create the formula.
Hope that makes things a bit clearer.
This formula will match the desired results you show in the sample spreadsheet:
=if(
(max(A$2:A2) >= 100) * (A2 < 100)
+
(max(B$2:B2) >= 100) * (B2 < 100)
+
(row(B2) = row(B$2)) * (B2 < 100),
"Failing",
"OK"
)

Calculate statistical difference from pivot table

I've built this table:
s_male Values
0 1
hs_name1 AVERAGE of sat_composite STDEV of sat_composite COUNT of s_lasid AVERAGE of sat_composite STDEV of sat_composite COUNT of s_lasid
Hope High School 986 600 639 979 630 579
James High School 837 568 473 830 612 428
Juniper High School 789 525 538 722 577 466
Kennedy High School 531 468 314 523 484 239
King High School 683 540 275 619 569 258
Lincoln High School 842 538 354 933 534 279
Meadowbrook High School 484 517 292 484 507 274
North Falls High School 1056 531 590 1046 547 564
Orange High School 905 597 555 828 619 526
Polk High School 680 569 567 691 568 501
South Falls High School 898 602 488 904 584 461
Upper Hills High School 457 491 349 431 490 248
Washington High School 795 609 482 818 635 401
Grand Total 801 585 5916 796 603 5224
Alos pictured here:
I now want to calculate if the average SAT_composite score for women (s_male=0) is statistically different than for men (s_male=1).
I've been trying to figure this out and I am a little lost. Any help would be greatly appreciated.
What you're looking for is just a regular t-test. In google sheets the syntax is:
=TTEST(B2:B8,C2:C8,2,2)
That will give you the p value associated with the test
Arguments:
Range for male scores
Range for female scores
How many tails? 1 or 2
Type of t-test. This choice is important and depends on whether certain assumptions have been violated (primarily independence and homoscedasticity)
I would agree with Toms suggestion in the comments about going back to the raw data as you'll be able to better model your data (in your example you've already collapsed across schools by calculating means, which loses information)

torch / lua: retrieving n-best subset from Tensor

I have following code now, which stores the indices with the maximum score for each question in pred, and convert it to string.
I want to do the same for n-best indices for each question, not just single index with the maximum score, and convert them to string. I also want to display the score for each index (or each converted string).
So scores will have to be sorted, and pred will have to be multiple rows/columns instead of 1 x nqs. And corresponding score value for each entry in pred must be retrievable.
I am clueless as to lua/torch syntax, and any help would be greatly appreciated.
nqs=dataset['question']:size(1);
scores=torch.Tensor(nqs,noutput);
qids=torch.LongTensor(nqs);
for i=1,nqs,batch_size do
xlua.progress(i, nqs)
r=math.min(i+batch_size-1,nqs);
scores[{{i,r},{}}],qids[{{i,r}}]=forward(i,r);
end
tmp,pred=torch.max(scores,2);
answer=json_file['ix_to_ans'][tostring(pred[{i,1}])]
print(answer)
Here is my attempt, I demonstrate its behavior using a simple random scores tensor:
> scores=torch.floor(torch.rand(4,10)*100)
> =scores
9 1 90 12 62 1 62 86 46 27
7 4 7 4 71 99 33 48 98 63
82 5 73 84 61 92 81 99 65 9
33 93 64 77 36 68 89 44 19 25
[torch.DoubleTensor of size 4x10]
Now, since you want the N best indexes for each question (row), let's sort each row of the tensor:
> values,indexes=scores:sort(2)
Now, let's look at what the return tensors contain:
> =values
1 1 9 12 27 46 62 62 86 90
4 4 7 7 33 48 63 71 98 99
5 9 61 65 73 81 82 84 92 99
19 25 33 36 44 64 68 77 89 93
[torch.DoubleTensor of size 4x10]
> =indexes
2 6 1 4 10 9 5 7 8 3
2 4 1 3 7 8 10 5 9 6
2 10 5 9 3 7 1 4 6 8
9 10 1 5 8 3 6 4 7 2
[torch.LongTensor of size 4x10]
As you see, the i-th row of values is the sorted version (in increasing order) of the i-th row of scores, and each row in indexes gives you the corresponding indexes.
You can get the N best values/indexes for each question (i.e. row) with
> N_best_indexes=indexes[{{},{indexes:size(2)-N+1,indexes:size(2)}}]
> N_best_values=values[{{},{values:size(2)-N+1,values:size(2)}}]
Let's see their values for the given example, with N=3:
> return N_best_indexes
7 8 3
5 9 6
4 6 8
4 7 2
[torch.LongTensor of size 4x3]
> return N_best_values
62 86 90
71 98 99
84 92 99
77 89 93
[torch.DoubleTensor of size 4x3]
So, the k-th best value for question j is N_best_values[{{j},{values:size(2)-k+1}]], and its corresponding index in the scores matrix is given by this row, column values:
row=j
column=N_best_indexes[{{j},indexes:size(2)-k+1}}].
For example, the first best value (k=1) for the second question is 99, which lies at the 2nd row and 6th column in scores. And you can see that values[{{2},values:size(2)}}] is 99, and that indexes[{{2},{indexes:size(2)}}] gives you 6, which is the column index in the scores matrix.
Hope that I explained my solution well.

how to join two pandas dataframe on specific column

I have 1st pandas dataframe which looks like this
order_id buyer_id caterer_id item_id qty_purchased
387 139 1 7 3
388 140 1 6 3
389 140 1 7 3
390 36 1 9 3
391 79 1 8 3
391 79 1 12 3
391 79 1 7 3
392 72 1 9 3
392 72 1 9 3
393 65 1 9 3
394 65 1 10 3
395 141 1 11 3
396 132 1 12 3
396 132 1 15 3
397 31 1 13 3
404 64 1 14 3
405 146 1 15 3
And the 2nd dataframe looks like this
item_id meal_type
6 Veg
7 Veg
8 Veg
9 NonVeg
10 Veg
11 Veg
12 Veg
13 NonVeg
14 Veg
15 NonVeg
16 NonVeg
17 Veg
18 Veg
19 NonVeg
20 Veg
21 Veg
I want to join this two data frames on item_id column. So that the final data frame should contain item_type where it has a match with item_id.
I am doing following in python
pd.merge(segments_data,meal_type,how='left',on='item_id')
But it gives me all nan values
You have to check types by dtypes of both columns (names) to join on.
If there are different, you can cast them, because you need same dtypes. Sometimes numeric columns are string columns, but looks like numbers.
If there are both same string types, maybe help cast both of them to int. Problem can be some whitespaces:
segments_data['item_id'] = segments_data['item_id'].astype(int)
meal_type['item_id'] = meal_type['item_id'].astype(int)
pd.merge(segments_data,meal_type,how='left',on='item_id')

Resources