how to join two pandas dataframe on specific column - join

I have 1st pandas dataframe which looks like this
order_id buyer_id caterer_id item_id qty_purchased
387 139 1 7 3
388 140 1 6 3
389 140 1 7 3
390 36 1 9 3
391 79 1 8 3
391 79 1 12 3
391 79 1 7 3
392 72 1 9 3
392 72 1 9 3
393 65 1 9 3
394 65 1 10 3
395 141 1 11 3
396 132 1 12 3
396 132 1 15 3
397 31 1 13 3
404 64 1 14 3
405 146 1 15 3
And the 2nd dataframe looks like this
item_id meal_type
6 Veg
7 Veg
8 Veg
9 NonVeg
10 Veg
11 Veg
12 Veg
13 NonVeg
14 Veg
15 NonVeg
16 NonVeg
17 Veg
18 Veg
19 NonVeg
20 Veg
21 Veg
I want to join this two data frames on item_id column. So that the final data frame should contain item_type where it has a match with item_id.
I am doing following in python
pd.merge(segments_data,meal_type,how='left',on='item_id')
But it gives me all nan values

You have to check types by dtypes of both columns (names) to join on.
If there are different, you can cast them, because you need same dtypes. Sometimes numeric columns are string columns, but looks like numbers.
If there are both same string types, maybe help cast both of them to int. Problem can be some whitespaces:
segments_data['item_id'] = segments_data['item_id'].astype(int)
meal_type['item_id'] = meal_type['item_id'].astype(int)
pd.merge(segments_data,meal_type,how='left',on='item_id')

Related

How T Transpose Multiple Columns Values by Groups between groups delimiters in adjacent Column Google Sheets?

I have the following minimal example data (in reality 100's of groups) in range A1:P9 (same data in range A14:A22):
With Sample A1:AR9:
2
61
219
2
4
2
:
61
219
26
26
26
94
21
33
4
26
26
26
94
2
2
:
154
26
40
19
3
2
21
33
14
1
2
3
:
87
39
54
38
26
32
38
26
32
87
39
54
38
26
23
23
4
6
28
2
154
26
2
2
40
19
14
87
39
54
38
26
32
38
26
32
87
39
54
38
26
1
23
2
23
4
4
3
6
20
28
Or Sample A14:AQ22:
2
61
219
2
:
61
219
4
:
26
26
26
94
2
:
21
33
4
26
26
26
94
2
:
154
26
2
:
40
19
3
2
21
33
14
:
87
39
54
38
26
32
38
26
32
87
39
54
38
26
1
:
23
2
:
23
4
:
3
6
20
2
154
26
2
2
40
19
14
87
39
54
38
26
32
38
26
32
87
39
54
38
26
1
23
2
23
4
4
3
6
20
28
I need the output as shown in range Q1:AR3 or as in range Q14:AQ16.
Basically, at each group delimited/inbetween values in Column A, I would need:
The intemediary adjacent values in Column B to be transposed horizontally
And the adjacent content of Columns C to P (14 Columns, at least) to be "joined" together horizontaly an sequencialy "per group", including the content of the delimiter's row (in Column A).
As a bonus it would be really nice to have the Transposed data followed by a :, and each sub Content of Columns C to P to be also separated by a | (as shown in screenshot Q1:AR3 or Q14:AR16).
(Or if it's more feasible, alternatively, the simpler to read 2nd model as in A14:AQ22).
I have a really hard time putting together a formula to come to the expected result.
All I could think of was:
Transposing Column B's content by getting the rows of the adjacent Cells with values in column A,
Concatenating with the Column letter,
Duplicating it in a new column, and Filtering out the blank intermediary cells,
Then shifting the duplicated column 1 cell up,
Then concatenating within a TRANSPOSE formula to get the range of the groups,
Then finally transposing all the groups from Columns B in a new Colum
(very convoluted but I couldn't find better way).
To get to that input:
=TRANSPOSE(B1:B3)
=TRANSPOSE(B4:B5)
=TRANSPOSE(B7:B9)
That was already a very manual and error prone process, and still I could not successfully think of how to do the remaining content joining of Column C to P in a formula.
I tested the following approach but it's not working and would be very tedious process to fix to go and to implement on large datasets:
=TRANSPOSE(B1:B3)&": "&JOIN( " | " , FILTER(C1:P1, NOT(C2:P2 = "") ))&JOIN( " | " , FILTER(C2:P2, NOT(C2:P2 = "") ))&JOIN( " | " , FILTER(C43:P3, NOT(C3:P3 = "") ))
=TRANSPOSE(B4:B5)&": "&JOIN( " | " , FILTER(C4:P4, NOT(C4:P4 = "") ))&JOIN( " | " , FILTER(C5:P5, NOT(C5:P5 = "") ))
=TRANSPOSE(B6:B9)&": "&JOIN( " | " , FILTER(C6:P6, NOT(C6:P6 = "") ))&JOIN( " | " , FILTER(C7:P7, NOT(C7:P7 = "") ))&JOIN( " | " , FILTER(C8:P8, NOT(C8:P8 = "") ))&JOIN( " | " , FILTER(C8:P8, NOT(C9:P9 = "") ))
What better approach to favor toward the expected result? Preferably with a Formula, or if not possible with a script.
Any help is greatly appreciated.
For Sample 1 try this out:
=LAMBDA(norm,MAP(UNIQUE(norm),LAMBDA(ζ,{TRANSPOSE(FILTER(B1:B9,norm=ζ)),":",SPLIT(BYROW(TRANSPOSE(FILTER(BYROW(C1:P9,LAMBDA(r,TEXTJOIN("ζ",1,r))),norm=ζ)),LAMBDA(rr,TEXTJOIN("γ|γ",1,rr))),"ζγ")})))(SORT(SCAN(,SORT(A1:A9,ROW(A1:A9),),LAMBDA(a,c,IF(c="",a,c))),ROW(A1:A9),))

KDB combine/join different table

How can i join two different table like
all_order_ask:([]ask:();ask_qty:();exchange_name_ask:())
all_order_bid:([]bid:();bid_qty:();exchange_name_bid:())
and get =====>
final_order:ask:();ask_qty:();exchange_name_ask:();bid:();bid_qty:();exchange_name_bid:()
the two table have the same number of rows
you can use uj:
https://code.kx.com/q/ref/uj/
all_order_ask uj all_order_bid
ask ask_qty exchange_name_ask bid bid_qty exchange_name_bid
-----------------------------------------------------------
q)
If your tables look similar like this:
all_order_ask
ask ask_qty exchange_name_ask
----------------------------------
7.051033 8 bjd
1.497004 3 lln
2.400771 0 edg
1.039355 7 lij
2.353326 6 hon
6.423479 4 ncp
5.778177 6 gee
2.193148 5 ijf
1.66486 4 bbf
4.784272 2 lmi
all_order_bid
bid bid_qty exchange_name_bid
----------------------------------
15.70605 2 pjbke
10.93533 17 epjak
7.040985 11 ekaaj
14.19316 19 mpnan
9.248942 17 nogel
1.615466 18 holpj
1.073589 16 kkfpn
19.85822 13 pegin
14.45499 8 jcgnm
16.47223 0 dlhep
You can try this:
all_order_ask^all_order_bid
ask ask_qty exchange_name_ask bid bid_qty exchange_name_bid
---------------------------------------------------------------------
7.051033 8 bjd 15.70605 2 pjbke
1.497004 3 lln 10.93533 17 epjak
2.400771 0 edg 7.040985 11 ekaaj
1.039355 7 lij 14.19316 19 mpnan
2.353326 6 hon 9.248942 17 nogel
6.423479 4 ncp 1.615466 18 holpj
5.778177 6 gee 1.073589 16 kkfpn
2.193148 5 ijf 19.85822 13 pegin
1.66486 4 bbf 14.45499 8 jcgnm
4.784272 2 lmi 16.47223 0 dlhep
Since your two tables have the same number of rows, you should also be able to join your two tables horizontally using ,' as follows:
q)final_order_ask:all_order_ask,'all_order_bid
q)final_order_ask
ask ask_qty exchange_name_ask bid bid_qty exchange_name_bid
-----------------------------------------------------------

How to only KEEP almost similar records in cognos

You're right!
This is what we are trying to achieve:
For each OrderID, Keep ONLY those OrderIDs where where their ColumnD has at least 2 SAME consecutive values AND where ColumnC has at least 2 same value in it.
So for example we would be keeping the first 2 rows of OrderID 101
So for example we would be keeping the 3 rows of OrderID 104
So for example we would be keeping the 2 rows of OrderID 305
The rest we don't want to see in the report!
Here is an image that might help explain what we want to achieve.
OrderID
ColumnB
ColumnC
ColumnD
101
159
10
18$
101
132
10
18$
101
147
22
18$
102
111
12
55$
103
130
10
18$
104
123
381
75$
104
456
381
75$
104
789
381
75$
305
555
101
37$
305
652
101
37$

torch / lua: retrieving n-best subset from Tensor

I have following code now, which stores the indices with the maximum score for each question in pred, and convert it to string.
I want to do the same for n-best indices for each question, not just single index with the maximum score, and convert them to string. I also want to display the score for each index (or each converted string).
So scores will have to be sorted, and pred will have to be multiple rows/columns instead of 1 x nqs. And corresponding score value for each entry in pred must be retrievable.
I am clueless as to lua/torch syntax, and any help would be greatly appreciated.
nqs=dataset['question']:size(1);
scores=torch.Tensor(nqs,noutput);
qids=torch.LongTensor(nqs);
for i=1,nqs,batch_size do
xlua.progress(i, nqs)
r=math.min(i+batch_size-1,nqs);
scores[{{i,r},{}}],qids[{{i,r}}]=forward(i,r);
end
tmp,pred=torch.max(scores,2);
answer=json_file['ix_to_ans'][tostring(pred[{i,1}])]
print(answer)
Here is my attempt, I demonstrate its behavior using a simple random scores tensor:
> scores=torch.floor(torch.rand(4,10)*100)
> =scores
9 1 90 12 62 1 62 86 46 27
7 4 7 4 71 99 33 48 98 63
82 5 73 84 61 92 81 99 65 9
33 93 64 77 36 68 89 44 19 25
[torch.DoubleTensor of size 4x10]
Now, since you want the N best indexes for each question (row), let's sort each row of the tensor:
> values,indexes=scores:sort(2)
Now, let's look at what the return tensors contain:
> =values
1 1 9 12 27 46 62 62 86 90
4 4 7 7 33 48 63 71 98 99
5 9 61 65 73 81 82 84 92 99
19 25 33 36 44 64 68 77 89 93
[torch.DoubleTensor of size 4x10]
> =indexes
2 6 1 4 10 9 5 7 8 3
2 4 1 3 7 8 10 5 9 6
2 10 5 9 3 7 1 4 6 8
9 10 1 5 8 3 6 4 7 2
[torch.LongTensor of size 4x10]
As you see, the i-th row of values is the sorted version (in increasing order) of the i-th row of scores, and each row in indexes gives you the corresponding indexes.
You can get the N best values/indexes for each question (i.e. row) with
> N_best_indexes=indexes[{{},{indexes:size(2)-N+1,indexes:size(2)}}]
> N_best_values=values[{{},{values:size(2)-N+1,values:size(2)}}]
Let's see their values for the given example, with N=3:
> return N_best_indexes
7 8 3
5 9 6
4 6 8
4 7 2
[torch.LongTensor of size 4x3]
> return N_best_values
62 86 90
71 98 99
84 92 99
77 89 93
[torch.DoubleTensor of size 4x3]
So, the k-th best value for question j is N_best_values[{{j},{values:size(2)-k+1}]], and its corresponding index in the scores matrix is given by this row, column values:
row=j
column=N_best_indexes[{{j},indexes:size(2)-k+1}}].
For example, the first best value (k=1) for the second question is 99, which lies at the 2nd row and 6th column in scores. And you can see that values[{{2},values:size(2)}}] is 99, and that indexes[{{2},{indexes:size(2)}}] gives you 6, which is the column index in the scores matrix.
Hope that I explained my solution well.

Ruby script to extract numbers [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a .txt file with characters that look like this:
7 3 5 7 3 3 3 3 3 3 3 6 7 5 5 22 1 4 23 16 18 5 13 34 24 17 50 30 42 35 29 27 52 35 44 52 36 39 25 40 50 52 40 2 52 52 31 35 30 19 32 46 50 43 36 15 21 16 36 25 7 3 5 17 3 3 3 3 23 3 3 46 1 2
I want to extract numbers >10 only if 7 or more of the next 15 numbers are greater than 10 too.
In this case, I would have the output:
22 1 4 23 16 18 5 13 34 24 17 50 30 42 35 29 27 52 35 44 52 36 39 25 40 50 52 40 2 52 52 31 35 30 19 32 46 50 43 36 15 21 16 36 25
Note that in this output there's numbers <10, but they pass the condition of having 7 or more of the next 15 numbers >10.
Sounds like a homework question, but I'll give you an attempted answer just for fun anyway.
numbers = "7 3 5 7 3 3 3 3 3 3 3 6 7 5 5 22 1 4 23 16 18 5 13 34 24 17 50 30 42 35 29 27 52 35 44 52 36 39 25 40 50 52 40 2 52 52 31 35 30 19 32 46 50 43 36 15 21 16 36 25 7 3 5 17 3 3 3 3 23 3 3 46 1 2"
numbers.split.each_cons(16).map{|x| x[0] if x[1..15].count{|y| y.to_i > 10} >= 7}.compact
num_string = "7 3 5 7 3 3 3 3 3 3 3 6 7 5 5 22 1 4 23 16 18 5 13 34 24 17 50 30 42 35 29 27 52 35 44 52 36 39 25 40 50 52 40 2 52 52 31 35 30 19 32 46 50 43 36 15 21 16 36 25 7 3 5 17 3 3 3 3 23 3 3 46 1 2"
num_arr = num_string.split(" ")
def next_ones(arr)
counter = 0
arr.each do |num|
if num.to_i > 10
counter += 1
end
end
if counter >= 7
arr[0]
end
end
def processor(arr)
answer = []
arr.each_with_index do |num, index|
if num.to_i > 10
answer << next_ones(arr[index...(index + 15)])
end
end
answer.compact.join(" ")
end
processor(num_arr)
A bit verbose, and with bad naming, but it should give you some ideas.

Resources