Dijkstra algorithm under constraint - graph-algorithm

I have N vertices one being the source. I would like to find the shortest path that connects all the vertices together (so a N-steps path) with the constraint that all the vertices cannot be visited at whichever step.
A network is defined by N the number of vertices, the source, the cost to travel between each pair of vertices and, for each step the list of vertices that can be visited
For example, if N=5 and the vertices are 1(the source),2,3,4 and 5, the list [[2, 3, 4], [2, 3, 4, 5], [2, 3, 4, 5], [3, 4, 5]] means that for step 2 only vertices 2,3 and 4 can be visited and so forth...
I can't figure out how to adapt the Dijkstra algorithm to my problem. I would really like some ideas Or maybe a better solution is to find something else, are there others algorithm that can handle this problem ?
Note : I posted the same question at math.stackexchange, I apologize if it is considered as a duplicate

You don't need any adaptation. Dijkstra algorithm will work fine under these constraints.
Following your example:
Starting from the vertex 1 we can get to 2 (let's suppose distance d = 2), 3 (d = 7) and 4 (d = 11) - current values of distance is [0, 2, 7, 11, N/A]
Next, pick the vertex with the shortest distance (vertex 2) - we can get from it to 2 again (shouldn't be counted), 3 (d = 3), 4 (d = 4) or 5 (d = 9). We see, that we can get to the vertex 3 with distance 2 + 3 = 5 < 7, which is shorter than 7, so update the value. The same is for the vertex 4 (2 + 4 = 6 < 11) - current values are [0, 2, 5, 6, 9]
Mark all the vertices we visited and follow the algorithm until all the vertices are selected.

Related

Dask Tutorial: Explain unexpected results from tutorial

The tutorial page requested that we ask questions here.
On tutorial 01_dask.delayed, there is the following code:
Parallelizing Increment
Prep
from time import sleep
def inc(x):
sleep(1)
return x + 1
def add(x, y):
sleep(1)
return x + y
data = [1, 2, 3, 4, 5, 6, 7, 8]
Calc
results = []
for x in data:
y = delayed(inc)(x)
results.append(y)
total = delayed(sum)(results)
print("Before computing:", total) # Let's see what type of thing total is
result = total.compute()
print("After computing :", result) # After it's computed
This code takes 1 second. This makes sense; each of the 8 inc calculations takes 1 second, the rest are ~ instantaneous, and it can all be run fully in parallel.
Parallelizing Increment and Double
Prep
def double(x):
sleep(1)
return 2 * x
def is_even(x):
return not x % 2
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Calc
results = []
for x in data:
if is_even(x): # even
y = delayed(double)(x)
else: # odd
y = delayed(inc)(x)
results.append(y)
#total = delayed(sum)(results)
total = sum(results)
This takes 2 seconds, which seems strange to me. The situation is the same as above; there are 10 actions that each take 1 second each, and can again be run fully in parallel.
The only thing I can imagine is that my machine is only able to allow for 8 tasks in parallel, but this is tough to know for sure because I have an Intel Core i7 and it seems that some have 8 threads and some have 16. (I have a MacBook Pro, and Apple notoriously likes to hide this detailed information from us pleebs.)
Can anyone confirm if this is what is going on? I am nearly certain, because bumping the data object for the first portion from data = [1, 2, 3, 4, 5, 6, 7, 8] to data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] also bumps the time up to 2 seconds.
I believe your analysis is correct, and you have 8 threads running in parallel for 1s each, before moving on to the remaining data, which do not fill all the threads, but still take 1s to complete.
You may want to try with the distributed scheduler, which provides dashboards for more feedback on what is going on (see later in the tutorial).

How to efficiently find correspondences between two point sets without nested for loop in Pytorch?

I now have two point sets (tensor) A and B that shape like
A.size() >>(50, 3) , example: [ [0, 0, 0], [0, 1, 2], ..., [1, 1, 1]]
B.size() >>(10, 3)
where the first dimension stands for number of points and the second dim stands for coordinates (x,y,z)
To some extent, the question could also be simplified into " Finding common elements between two tensors ". Is there a quick way to do this without nested loop ?
You can quickly compute all the 50x10 distances using:
d2 = ((A[:, None, :] - B[None, ...])**2).sum(dim=2)
Once you have all the pair-wise distances, you can select "similar" ones if the distance does not exceed a threshold thr:
(d2 < thr).nonzero()
returns pairs of a-idx, b-idx of "similar" points.
If you want to match the points exactly, you can do instead:
((a[:, None, :] == b[None, ...]).all(dim=2)).nonzero()

Correct way to split data to batches for Keras stateful RNNs

As the documentation states
the last state for each sample at index i in a batch will be used as
initial state for the sample of index i in the following batch
does it mean that to split data to batches I need to do it the following way
e.g. let's assume that I am training a stateful RNN to predict the next integer in range(0, 5) given the previous one
# batch_size = 3
# 0, 1, 2 etc in x are samples (timesteps and features omitted for brevity of the example)
x = [0, 1, 2, 3, 4]
y = [1, 2, 3, 4, 5]
batches_x = [[0, 1, 2], [1, 2, 3], [2, 3, 4]]
batches_y = [[1, 2, 3], [2, 3, 4], [3, 4, 5]]
then the state after learning on x[0, 0] will be initial state for x[1, 0]
and x[0, 1] for x[1, 1] (0 for 1 and 1 for 2 etc)?
Is it the right way to do it?
Based on this answer, for which I performed some tests.
Stateful=False:
Normally (stateful=False), you have one batch with many sequences:
batch_x = [
[[0],[1],[2],[3],[4],[5]],
[[1],[2],[3],[4],[5],[6]],
[[2],[3],[4],[5],[6],[7]],
[[3],[4],[5],[6],[7],[8]]
]
The shape is (4,6,1). This means that you have:
1 batch
4 individual sequences = this is batch size and it can vary
6 steps per sequence
1 feature per step
Every time you train, either if you repeat this batch or if you pass a new one, it will see individual sequences. Every sequence is a unique entry.
Stateful=True:
When you go to a stateful layer, You are not going to pass individual sequences anymore. You are going to pass very long sequences divided in small batches. You will need more batches:
batch_x1 = [
[[0],[1],[2]],
[[1],[2],[3]],
[[2],[3],[4]],
[[3],[4],[5]]
]
batch_x2 = [
[[3],[4],[5]], #continuation of batch_x1[0]
[[4],[5],[6]], #continuation of batch_x1[1]
[[5],[6],[7]], #continuation of batch_x1[2]
[[6],[7],[8]] #continuation of batch_x1[3]
]
Both shapes are (4,3,1). And this means that you have:
2 batches
4 individual sequences = this is batch size and it must be constant
6 steps per sequence (3 steps in each batch)
1 feature per step
The stateful layers are meant to huge sequences, long enough to exceed your memory or your available time for some task. Then you slice your sequences and process them in parts. There is no difference in the results, the layer is not smarter or has additional capabilities. It just doesn't consider that the sequences have ended after it processes one batch. It expects the continuation of those sequences.
In this case, you decide yourself when the sequences have ended and call model.reset_states() manually.

What is the first parameter (gradients) of the backward method, in pytorch?

We have the following code from the pytorch documentation:
x = torch.randn(3)
x = Variable(x, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)
What exactly is the gradients parameter that we pass into the backward method and based on what do we initialize it?
To fully answer your question, it'd require a somewhat longer explanation that evolves around the details of how Backprop or, more fundamentally, the chain rule works.
The short programmatic answer is that the backwards function of a Variable computes the gradient of all variables in the computation graph attached to that Variable. (To clarify: if you have a = b + c, then the computation graph (recursively) points first to b, then to c, then to how those are computed, etc.) and cumulatively stores (sums) these gradients in the .grad attribute of these Variables. When you then call opt.step(), i.e. a step of your optimizer, it adds a fraction of that gradient to the value of these Variables.
That said, there are two answers when you look at it conceptually: If you want to train a Machine Learning model, you normally want to have the gradient with respect to some loss function. In this case, the gradients computed will be such that the overall loss (a scalar value) will decrease when applying the step function. In this special case, we want to compute the gradient to a specific value, namely the unit length step (so that the learning rate will compute the fraction of the gradients that we want). This means that if you have a loss function, and you call loss.backward(), this will compute the same as loss.backward(torch.FloatTensor([1.])).
While this is the common use case for backprop in DNNs, it is only a special case of general differentiation of functions. More generally, the symbolic differentiation packages (autograd in this case, as part of pytorch) can be used to compute gradients of earlier parts of the computation graph with respect to any gradient at a root of whatever subgraph you choose. This is when the key-word argument gradient comes in useful, since you can provide this "root-level" gradient there, even for non-scalar functions!
To illustrate, here's a small example:
a = nn.Parameter(torch.FloatTensor([[1, 1], [2, 2]]))
b = nn.Parameter(torch.FloatTensor([[1, 2], [1, 2]]))
c = torch.sum(a - b)
c.backward(None) # could be c.backward(torch.FloatTensor([1.])) for the same result
print(a.grad, b.grad)
prints:
Variable containing:
1 1
1 1
[torch.FloatTensor of size 2x2]
Variable containing:
-1 -1
-1 -1
[torch.FloatTensor of size 2x2]
While
a = nn.Parameter(torch.FloatTensor([[1, 1], [2, 2]]))
b = nn.Parameter(torch.FloatTensor([[1, 2], [1, 2]]))
c = torch.sum(a - b)
c.backward(torch.FloatTensor([[1, 2], [3, 4]]))
print(a.grad, b.grad)
prints:
Variable containing:
1 2
3 4
[torch.FloatTensor of size 2x2]
Variable containing:
-1 -2
-3 -4
[torch.FloatTensor of size 2x2]
and
a = nn.Parameter(torch.FloatTensor([[0, 0], [2, 2]]))
b = nn.Parameter(torch.FloatTensor([[1, 2], [1, 2]]))
c = torch.matmul(a, b)
c.backward(torch.FloatTensor([[1, 1], [1, 1]])) # we compute w.r.t. a non-scalar variable, so the gradient supplied cannot be scalar, either!
print(a.grad, b.grad)
prints
Variable containing:
3 3
3 3
[torch.FloatTensor of size 2x2]
Variable containing:
2 2
2 2
[torch.FloatTensor of size 2x2]
and
a = nn.Parameter(torch.FloatTensor([[0, 0], [2, 2]]))
b = nn.Parameter(torch.FloatTensor([[1, 2], [1, 2]]))
c = torch.matmul(a, b)
c.backward(torch.FloatTensor([[1, 2], [3, 4]])) # we compute w.r.t. a non-scalar variable, so the gradient supplied cannot be scalar, either!
print(a.grad, b.grad)
prints:
Variable containing:
5 5
11 11
[torch.FloatTensor of size 2x2]
Variable containing:
6 8
6 8
[torch.FloatTensor of size 2x2]

How to predict users' preferences using item similarity?

I am thinking if I can predict if a user will like an item or not, given the similarities between items and the user's rating on items.
I know the equation in collaborative filtering item-based recommendation, the predicted rating is decided by the overall rating and similarities between items.
The equation is:
http://latex.codecogs.com/gif.latex?r_{u%2Ci}%20%3D%20\bar{r_{i}}%20&plus;%20\frac{\sum%20S_{i%2Cj}%28r_{u%2Cj}-\bar{r_{j}}%29}{\sum%20S_{i%2Cj}}
My question is,
If I got the similarities using other approaches (e.g. content-based approach), can I still use this equation?
Besides, for each user, I only have a list of the user's favourite items, not the actual value of ratings.
In this case, the rating of user u to item j and average rating of item j is missing. Is there any better ways or equations to solve this problem?
Another problem is, I wrote a python code to test the above equation, the code is
mat = numpy.array([[0, 5, 5, 5, 0], [5, 0, 5, 0, 5], [5, 0, 5, 5, 0], [5, 5, 0, 5, 0]])
print mat
def prediction(u, i):
target = mat[u,i]
r = numpy.mean(mat[:,i])
a = 0.0
b = 0.0
for j in range(5):
if j != i:
simi = 1 - spatial.distance.cosine(mat[:,i], mat[:,j])
dert = mat[u,j] - numpy.mean(mat[:,j])
a += simi * dert
b += simi
return r + a / b
for u in range(4):
lst = []
for i in range(5):
lst.append(str(round(prediction(u, i), 2)))
print " ".join(lst)
The result is:
[[0 5 5 5 0]
[5 0 5 0 5]
[5 0 5 5 0]
[5 5 0 5 0]]
4.6 2.5 3.16 3.92 0.0
3.52 1.25 3.52 3.58 2.5
3.72 3.75 3.72 3.58 2.5
3.16 2.5 4.6 3.92 0.0
The first matrix is the input and the second one is the predicted values, they looks not close, anything wrong here?
Yes, you can use different similarity functions. For instance, cosine similarity over ratings is common but not the only option. In particular, similarity using content-based filtering can help with a sparse rating dataset (if you have relatively dense content metadata for items) because you're mapping users' preferences to the smaller content space rather than the larger individual item space.
If you only have a list of items that users have consumed (but not the magnitude of their preferences for each item), another algorithm is probably better. Try market basket analysis, such as association rule mining.
What you are referring to is a typical situation of implicit ratings (i.e. users do not give explicit ratings to items, let's say you just have likes and dislikes).
As for the approches you can use Neighbourhood models or latent factor models.
I will suggest you to read this paper that proposes a well known machine-learning based solution to the problem.

Resources