How to find same edges of two paths? - graph-algorithm

A path is represented by a vector, containing node id. The edge in the path has direction.
Given two paths, for example : <1,6,11,7,2,5 ...> and <3, 4, 8, 2, 7,3, 1,6...>, here <1,6> is the same edge. Sometimes the edges are successive, some times not. It's better to put a flag between these edges. For example,
(1,2) * (5,7,9) * (6,11,12), are same edge 1->2, 5->7,7->9, 6->11, 11->12, but there is no edges from 2 to 5 or 9 to 6. So put a '*' or other symbol as a flag.
Is there anyone has some ideas? I will be appreciate it.

Assuming each node has only one incoming and one outcoming edge.
Call P1 the first path of length n and P2 the second path of length m. You can turn P2 into a hashmap startEdge -> endEdge (e.g <3,4,5> would become [3->4, 4->5]).
Then for each element of P1, say number i, you compare P1(i+1) to Hashmap(key= P1(i)). If the hashmap doesn't have the key or has it but with a different value, you don't have a common edge, otherwise you do.
(If you have multiple edges for one node, values of hashmap can be Sets of Ints, and you would check whether P1(i+1) is contained within Hashmap(key=P1(i))).

Here's an example solution in Clojure:
(defn same-edges [& paths]
(->> paths
(map (comp set (partial partition 2 1)))
(apply clojure.set/intersection)))
So, for each path (map over all paths), you partition the path into 2-element subpaths (using a step of 1 to get all pairs of adjacent items), then calculate the set of all unique pairs attained from that partition. Then you find the intersection of all those sets.
Example:
(same-edges [1 6 11 7 2 5] [3 4 8 2 7 3 1 6])
;=> #{(1 6)}
In other words, the set of shared edges between the two paths represented by the vectors [1 6 11 7 2 5] and [3 4 8 2 7 3 1 6] contains only one item: the pair (1 6).

Related

How do covolution2d and maxpolling2d apply on input in Keras?

I'm new to all the stuff I'm going to talking about so that the questions may be too simple.
Thanks in advance for your answers!
My questions cames from the following image:
To be more clear:
For the first Convolution, from 1 x 28 x28 to 25 x 26 x26, the input (1 layer) goes through the filter (25 layers). So, one layer was filtered 25 times ( right ? ).
But for the second Convolution, from 25 x 13 x 13 to 50 x 11 x 11, what's the operation of the filter 50 x 3 x 3 applied on the input 25 x 13 x 13? I confused about the operation. Because the output should be 1250 x 11 x 11 if each layer of the input 25 x 13 x 13 goes through the filter 50 x 3 x 3. Why is the output still 50 layers?
For the second Max Pooling, how does MaxPooling2D() deal with a layer with odd size? The remainder of (11 mod 2) is 1. In the above image, from 11 to 5, what happened on the 1?
In addition, What's the common operation for Max Pooling an odd-size input layer?
Each convolution is applied to all the channels of the input(output of the previous layer), in this case, one filter( from the 50x3x3 Conv2d) is applied to all 25(from 25x3x3 Conv2d) input then the results will be summed up to give one output feature map of 50Conv2d, this will be done for 50 times. here is a link about how filters are applied to features maps. The rule of thumb is, if the next convolution have N filters, its output should also have N features maps.
For maxPooling, the default value for padding in MaxPooling2D; which is applied in your case, is "valid|", it means that the pooling function will not include values that can not be contained in kernel size. in your example, kernel is 2, that means the 11th element was not included in the operation. Here is a good link about the padding="valid" flag, the second answer has a good visual on how some elements of the input are left out during this operation.
Additionally, it may be good idea to do strides instead of max pooling.
You can easily find folks discussing and comparing it on the net. As you're stumbling with dimensionality, using strides is less mind-boggling.
https://stats.stackexchange.com/questions/387482/pooling-vs-stride-for-downsampling
https://www.pyimagesearch.com/2018/12/31/keras-conv2d-and-convolutional-layers/
https://machinelearningmastery.com/padding-and-stride-for-convolutional-neural-networks/
Hope it helps.

Vectorization of FOR loop

Is there a way to vectorize this FOR loop I know about gallery ("circul",y) thanks to user carandraug
but this will only shift the cell over to the next adjacent cell I also tried toeplitz but that didn't work).
I'm trying to make the shift adjustable which is done in the example code with circshift and the variable shift_over.
The variable y_new is the output I'm trying to get but without having to use a FOR loop in the example (can this FOR loop be vectorized).
Please note: The numbers that are used in this example are just an example the real array will be voice/audio 30-60 second signals (so the y_new array could be large) and won't be sequential numbers like 1,2,3,4,5.
tic
y=[1:5];
[rw col]= size(y); %get size to create zero'd array
y_new= zeros(max(rw,col),max(rw,col)); %zero fill new array for speed
shift_over=-2; %cell amount to shift over
for aa=1:length(y)
if aa==1
y_new(aa,:)=y; %starts with original array
else
y_new(aa,:)=circshift(y,[1,(aa-1)*shift_over]); %
endif
end
y_new
fprintf('\nfinally Done-elapsed time -%4.4fsec- or -%4.4fmins- or -%4.4fhours-\n',toc,toc/60,toc/3600);
y_new =
1 2 3 4 5
3 4 5 1 2
5 1 2 3 4
2 3 4 5 1
4 5 1 2 3
Ps: I'm using Octave 4.2.2 Ubuntu 18.04 64bit.
I'm pretty sure this is a classic XY problem where you want to calculate something and you think it's a good idea to build a redundant n x n matrix where n is the length of your audio file in samples. Perhaps you want to play with autocorrelation but the key point here is that I doubt that building the requested matrix is a good idea but here you go:
Your code:
y = rand (1, 3e3);
shift_over = -2;
clear -x y shift_over
tic
[rw col]= size(y); %get size to create zero'd array
y_new= zeros(max(rw,col),max(rw,col)); %zero fill new array for speed
for aa=1:length(y)
if aa==1
y_new(aa,:)=y; %starts with original array
else
y_new(aa,:)=circshift(y,[1,(aa-1)*shift_over]); %
endif
end
toc
my code:
clear -x y shift_over
tic
n = numel (y);
y2 = y (mod ((0:n-1) - shift_over * (0:n-1).', n) + 1);
toc
gives on my system:
Elapsed time is 1.00379 seconds.
Elapsed time is 0.155854 seconds.

Modifying Dijkstra to find path with max coloured node

I just saw a solution of a question that modifying Dijkstra to get the shortest path with a max of K coloured edge. I am wondering what if we want find the shortest path with coloured node instead of edge, how are we gonna modify Dijkstra to do the trick?
What I come up with is that on top of Dijkstra, I add an integer variable let say i. Then make a map to record how many coloured node it takes to get there, and if there is a way that passed through less coloured node, update it. And we will take the path with least coloured node. But this seems something is wrong, any suggestion?
Algorithm Dijkstra ( G , s in V(G), c(v) in{black, white}, K )
1. for each vertex u in V(G) do dist[u] <- +infinity
2. dist[s] <- 0 ; p[s] <- null
3. c(s)=black? r <- 1 : r <- 0
4. Q <- ConstructMinHeap(V(G), dist)
5. M <- Map(s, r)
6. while Q != null do
7. u <- DeleteMin(Q)
8. for each v in Adj[u] do
9. if M.value(u) is null then do
10. M <- Map(u, M.value(v) + c(u)=black? 1 : 0)
11. else
12. M.value(u) < (M.value(v) + c(u)=black? 1 : 0)? Update : do nothing
13. end-if
14. if dist[v] > dist[u] + w(u,v) and M.value < K then do
15. dist[v] <- dist[u] + w(u,v)
16. p[v] <- u
17. UpHeap(v,Q)
18. end-if
19. end-for
20. end-while
end
If you use a priority queue to rank your options, consider using both the distance so far and the number of coloured nodes passed through to determine the order of priority. In this manner, you can use the traditional Dijkstra and let the determination of a minimum path be determined by your priority ranking.

Compute similarity between n entities

I am trying to compute the similarity between n entities that are being described by entity_id, type_of_order, total_value.
An example of the data might look like:
NR entity_id type_of_order total_value
1 1 A 10
2 1 B 90
3 1 C 70
4 2 B 20
5 2 C 40
6 3 A 10
7 3 B 50
8 3 C 20
9 4 B 50
10 4 C 80
My question would be what is a god way of measuring the similarity between entity_id 1 and 2 for example with regards to the type_of_order and the total_value for that type of order.
Would a simple KNN give satisfactory results or should I consider other algorithms?
Any suggestion would be much appreciated.
The similarity metric is a heuristic to capture a relationship between two data rows, with respect to the data semantics and the purpose of the training. We don't know your data; we don't know your usage. It would be irresponsible to suggest metrics to solve a problem when we have no idea what problem we're solving.
You have to address this question to the person you find in the mirror. You've given us three features with no idea of what they mean or how they relate. You need to quantify ...
relative distances within features: under type_of_order, what is the relationship (distance) between any two measurements? If we arbitrarily assign d(A, B) = 1, then what is d(B, C)? We have no information to help you construct this. Further, if we give that some value c, then what is d(A, C)? In various popular metrics, it could be 1+c, |1-c|, all distances could be 1, or perhaps it's something else -- even more than 1+c in some applications.
Even in the last column, we cannot assume that d(10, 20) = d(40, 50); the actual difference could be a ratio, difference of squares, etc. Again, this depends on the semantics behind these labels.
relative weights between features: How do the differences in the various columns combine to provide a similarity? For instance, how does d([A, 10], [B, 20]) compare to d([A, 10], [C, 30])? That's two letters in the left column, two steps of 10 in the right column. How about d([A, 10], [A, 20]) vs d([A, 10], [B, 10])? Are the distances linear, or do the relationships change as we slide up the alphabet or to higher numbers?

mean image filter

Starting to learn image filtering and stumped on a question found on website: Applying a 3×3 mean filter twice does not produce quite the same result as applying a 5×5 mean filter once. However, a 5×5 convolution kernel can be constructed which is equivalent. What does this kernel look like?
Would appreciate help so that I can understand the subject better. Thanks.
Marcelo's answer is right. Another way of seeing it (more easy to think it first in one dimension) : we know that the mean filter is equivalent to a convolution with a rectangular window. And we know that the convolution is a linear operation, which is also associative.
Now, applying a mean filter M to a signal X can be written as
Y = M * X
where * denotes convolution. Appying the filter twice would then give
Y = M * (M * X) = (M * M) * X = M2 * X
This says that filtering twice a signal with a mean filter is the same as filtering it once with an equivalent filter given by M2 = M * M. Now, this consists of applying the mean filter to itself, what gives a "smoother" filter (a triangular filter in this case).
The process can be repeated, (see first graph here) and it can be shown that the equivalent filter for many repetitions of a mean filter (N convolutions of the rectangular filter with itself) tends to a gaussian filter. Further, it can be shown that the gaussian filter has that property you didn't found in the rectangular (mean) filter: two passes of a gaussian filter are equivalent to another gaussian filter.
3x3 mean:
[1 1 1]
[1 1 1] * 1/9
[1 1 1]
3x3 mean twice:
[1 2 3 2 1]
[2 4 6 4 2]
[3 6 9 6 3] * 1/81
[2 4 6 4 2]
[1 2 3 2 1]
How? Each cell contributes indirectly via one or more intermediate 3x3 windows. Consider the set of stage 1 windows that contribute to a given stage 2 computation. The number of such 3x3 windows that contain a given source cell determines the contribution by that cell. The middle cell, for instance, is contained in all nine windows, so its contribution is 9 * 1/9 * 1/9. I don't know if I've explained it that well, so I hope it makes sense to you.
Actually I believe that 3x3 twice should give:
[1 2 3 2 1]
[2 4 6 4 2]
[3 6 9 6 3] * 1/81
[2 4 6 4 2]
[1 2 3 2 1]
The reason is because the sum of all values must be equal to 1.

Resources