Apply function to each row in Torch - lua

I know that tensors have an apply method, but this only applies a function to each element. Is there an elegant way to do row-wise operations? For example, can I multiply each row by a different value?
Say
A =
1 2 3
4 5 6
7 8 9
and
B =
1
2
3
and I want to multiply each element in the ith row of A by the ith element of B to get
1 2 3
8 10 12
21 24 27
how would I do that?

See this link: Torch - Apply function over dimension
(Thanks to Alexander Lutsenko for providing it. I just moved it to the answer.)

One possibility is to expand B as follow:
1 1 1
2 2 2
3 3 3
[torch.DoubleTensor of size 3x3]
Then you can use element-wise multiplication directly:
local A = torch.Tensor{{1,2,3},{4,5,6},{7,8,9}}
local B = torch.Tensor{1,2,3}
local C = A:cmul(B:view(3,1):expand(3,3))

Related

How to manipulate multiple nested arrays in Dyalog APL?

I have been given matrices filled with alphanumerical values excluding lower case letters like so:
XX11X1X
XX88X8X
Y000YYY
ZZZZ789
ABABABC
and have been tasked with counting the repetitions in each row and then tallying up a score depending on the ranking of the character being repeated. I used {⍺ (≢⍵)}⌸¨ ↓ m to help me. For the example above I would get something like this:
X 4 X 4 Y 4 Z 4 A 3
1 3 8 3 0 3 7 1 B 3
8 1 C 1
9 1
This is great but now I need to do a function that would be able to multiply the numbers with each letter. I can access the first matrix with ⊃ but then I am completely lost on how to access the other ones. I can simply write ⊃w[2] and ⊃w[3] and so forth but I need a way to change every matrix at the same time in one function. For this example, the array of the ranking is as follow: ZYXWVUTSRQPONMLKJIHGFEDCBA9876543210 so for the first array XX11X1X
which corresponds to:
X 4
1 3
So the X is 3rd in the array so it corresponds to a 3 and 1 is 35th so it's a 35. The final scoring would be something like (3×104)+(35×103). My biggest problem is not necessarily the scoring part but being able to access each matrix individually in one function. So for this nested array:
X 4 X 4 Y 4 Z 4 A 3
1 3 8 3 0 3 7 1 B 3
8 1 C 1
9 1
if I do arr[1] it gives me the scalar
X 4
1 3
and ⍴ arr[1] gives me nothing confirming it so I can do ⊃arr[1] to get the matrix itself and have access to each column individually. This is where I'm stuck. I'm trying to write a function to be able to do the math for each matrix and then saving those results to an array. I can easily do the math for the first matrix but I can't do it for all of them. I might have made a mistake by making using {⍺ (≢⍵)}⌸¨ ↓ m to get those matrices. Thanks.
Using your example arrangement:
⎕ ← arranged ← ⌽ ⎕D , ⎕A
ZYXWVUTSRQPONMLKJIHGFEDCBA9876543210
So now, we can get the index values:
1 ⌷ m
XX11X1X
∪ 1 ⌷ m
X1
arranged ⍳ ∪ 1 ⌷ m
3 35
While you could compute the intermediary step first, it is much simpler to include most of the final formula in in Key's operand:
{ ( arranged ⍳ ⍺ ) × 10 * ≢⍵ }⌸¨ ↓m
┌───────────┬───────────┬───────────┬─────────────────┬───────────────┐
│30000 35000│30000 28000│20000 36000│10000 290 280 270│26000 25000 240│
└───────────┴───────────┴───────────┴─────────────────┴───────────────┘
Now we just need to sum each:
+/¨ { ( arranged ⍳ ⍺ ) × 10 * ≢⍵ }⌸¨ ↓m
65000 58000 56000 10840 51240
In fact, we can combine the summation with the application of Key to avoid a double loop:
{ +/ { ( arranged ⍳ ⍺ ) × 10 * ≢⍵ }⌸ ⍵}¨ ↓m
65000 58000 56000 10840 51240
For completeness, here is a way to use the intermediary result. Let's start by working on just the first matrix (you can get the second one with 2⊃ instead of ⊃ ― for details, see Problems when trying to use arrays in APL. What have I missed?):
⊃{⍺ (≢⍵)}⌸¨ ↓m
X 4
1 3
We can insert a function between the left column elements and the right column elements with reduction:
{⍺ 'foo' ⍵}/ ⊃{⍺ (≢⍵)}⌸¨ ↓m
┌─────────┬─────────┐
│┌─┬───┬─┐│┌─┬───┬─┐│
││X│foo│4│││1│foo│3││
│└─┴───┴─┘│└─┴───┴─┘│
└─────────┴─────────┘
So now we simply have to modify the placeholder function with one that looks up the left argument in the arranged items, and multiplies by ten to the power of the right argument:
{ ( arranged ⍳ ⍺ ) × 10 * ⍵ }/ ⊃{⍺ (≢⍵)}⌸¨ ↓m
30000 35000
Instead of applying this to only the first matrix, we apply it to each matrix:
{ ( arranged ⍳ ⍺ ) × 10 * ⍵ }/¨ {⍺ (≢⍵)}⌸¨ ↓m
┌───────────┬───────────┬───────────┬─────────────────┬───────────────┐
│30000 35000│30000 28000│20000 36000│10000 290 280 270│26000 25000 240│
└───────────┴───────────┴───────────┴─────────────────┴───────────────┘
Now we just need to sum each:
+/¨ { ( arranged ⍳ ⍺ ) × 10 * ⍵ }/¨ {⍺ (≢⍵)}⌸¨ ↓m
65000 58000 56000 10840 51240
However, this is a much more circuitous approach, and is only provided here for reference.

When re-inserting into queue - Huffman Code

Example
3 2 5 5
a b c d
Joining first two
5 | 5 5
3 2 | c d
a b |
I have to put the new tree of five into the queue
Am I obligated to put it in the end like this:
5 5 5
c d / \
3 2
a b
Or can I put it in the beginning:
5 5 5
3 2 c d
a b
Or even in the middle of 'c' and 'd'
Is it my choice or is there a rule?
It's not your choice, the Queue needs to be sorted at all times (by it's number of occurrences and in case of equal number of occurrences by the depth of the tree). So it needs to be inserted where it belongs into the order.
This is needed to pick the sub-trees with the least amount of occurrences and if there is choice the most shallow one of them by simply pop-ing them.
If you simply resort after every insertion (this is inefficient and should not be done) the position obviously doesn't matter.
Yes, it's your choice. Whichever way you will get an optimal Huffman code, even though two resulting codes can be manifestly different.
You can get:
a - 00
b - 01
c - 10
d - 11
or you can get:
a - 111
b - 110
c - 10
d - 0
Now if I multiply the number of bits in each symbol times the number of occurrences, I get for the first code: 2*3 + 2*2 + 2*5 + 2*5 = 30 bits. For the second code: 3*3 + 3*2 + 2*5 + 1*5 = 30 bits. So both codes will code the original message to exactly 30 bits.

Octave Conditional Merging of matrices

I have searched for an Octave function that facilitates conditional merging of matrices but haven't one so far. My goal is to do this using vectors without looping. Here is an example of what I am trying to do.
A= [1 1
2 2
3 1
5 2];
B= [1 9
2 10];
I would like to get C as
C= [1 1 9
2 2 10
3 1 9
5 2 10];
Is there a function that takes A, B and the list of column(s) to join on and then produce C?
You can use the second output of ismember to find the occurrences of the second column of A in the first column of B and then use that to grab specific entries from the second column of B to construct C.
[~, inds] = ismember(A(:,2), B(:,1));
C = [A, B(inds,2)];
%// 1 1 9
%// 2 2 10
%// 3 1 9
%// 5 2 10

Clustering unique datasets based on similarities (equality)

I just entered into the space of data mining, machine learning and clustering. I'm having special problem, and do not know which technique to use it for solving it.
I want to perform clustering of observations (objects or whatever) on specific data format. All variables in each observation is numeric. My data input looks like this:
1 2 3 4 5 6
1 3 5 7
2 9 10 11 12 13 14
45 1 22 23 24
Let's say that n represent row (observation, or 1D vector,..) and m represents column (variable index in each vector). n could be very large number, and 0 < m < 100. Also main point is that same observation (row) cannot have identical values (in 1st row, one value could appear only once).
So, I want to somehow perform clustering where I'll put observations in one cluster based on number of identical values which contain each row/observation.
If there are two rows like:
1
1 2 3 4 5
They should be clustered in same cluster, if there are no match than for sure not. Also number of each rows in one cluster should not go above 100.
Sick problem..? If not, just for info that I didn't mention time dimension. But let's skip that for now.
So, any directions from you guys,
Thanks and best regards,
JDK
Its hard to recommend anything since your problem is totally vague, and we have no information on the data. Data mining (and in particular explorative techniques like clustering) is all about understanding the data. So we cannot provide the ultimate answer.
Two things for you to consider:
1. if the data indicates presence of species or traits, Jaccard similarity (and other set based metrics) are worth a try.
2. if absence is less informative, maybe you should be mining association rules, not clusters
Either way, without understanding your data these numbers are as good as random numbers. You can easily cluster random numbers, and spend weeks to get the best useless result!
Can your problem be treated as a Bag-of-words model, where each article (observation row) has no more than 100 terms?
Anyway, I think your have to give more information and examples about "why" and "how" you want to cluster these data. For example, we have:
1 2 3
2 3 4
2 3 4 5
1 2 3 4
3 4 6
6 7 8
9 10
9 11
10 12 13 14
What is your expected clustering? How many clusters are there in this clustering? Only two clusters?
Before you give more information, according to you current description, I think you do not need a cluster algorithm, but a structure of connected components. The first round you process the dataset to get the information of connected components, and you need a second round to check each row belong to which connected components. Take the example above, first round:
1 2 3 : 1 <- 1, 1 <- 2, 1 <- 3 (all point linked to the smallest point to
represent they are belong to the same cluster of the smallest point)
2 3 4 : 2 <- 4 (2 and 3 have already linked to 1 which is <= 2, so they do
not need to change)
2 3 4 5 : 2 <- 5
1 2 3 4 : 1 <- 4 (in fact this change are not essential because we have
1 <- 2 <- 4, but change this can speed up the second round)
3 4 6 : 3 <- 6
6 7 8 : 6 <- 7, 6 <- 8
9 10 : 9 <- 9, 9 <- 10
9 11 : 9 <- 11
10 11 12 13 14 : 10 <- 12, 10 <- 13, 10 <- 14
Now we have a forest structure to represent the connected components of points. The second round you can easily pick up one point in each row (the smallest one is the best) and trace its root in the forest. The rows which have the same root are in the same, in your words, cluster. For example:
1 2 3 : 1 <- 1, cluster root 1
2 3 4 5 : 1 <- 1 <- 2, cluster root 1
6 7 8 : 1 <- 1 <- 3 <- 6, cluster root 1
9 10 : 9 <- 9, cluster root 9
10 11 12 13 14 : 9 <- 9 <- 10, cluster root 9
This process takes O(k) space where k is the number of points, and O(nm + nh) time, where r is the height of the forest structure, where r << m.
I am not sure if this is the result you want.

Using COUNTIFS on 3 different columns and then need to SUM a 4th column?

I have written this formula below. I do not know the correct part of this formula that will add the numbers I have in Column AB2:AB552. As it is, this formula is counting the number of cells in that range that has numbers in it, but I need it to total those numbers as my final result. Any help would be great.
=COUNTIFS(Cases!B2:B552,"1",Cases!G2:G552,"c*",Cases!X2:X552,"No",**Cases!AB2:AB552,">0"**)
Assuming you don't actually need the intermediate counts, the sumifs function should give you the final result:
=SUMIFS(Cases!AB2:AB552,Cases!B2:B552,1,Cases!G2:G552,"c",Cases!X2:X552,"No",Cases!AB2:AB552,">0")
Testing this with some limited data:
Row B G X AB
2 2 a No 10
3 1 c No 24
4 2 c No 4
5 1 c No 0
6 1 a Yes 9
7 2 c No 12
8 2 c No 6
9 2 b No 0
10 1 b No 0
11 1 a No 10
12 2 c No 6
13 1 c No 20
14 1 c No 4
15 1 b Yes 22
16 1 b Yes 22
the formula above returned 48, the sum of AB3, AB13, and AB14, which were the only rows matching all 4 criteria

Resources