Particle swarm optimization pbest and gbest - machine-learning

can update particles velocity and position and then find the pbest and gbest? or pbest and gbest must be found first? what pbest and gbest in PSO?

If I understood correctly your question, the answer is yes. As we know, the PSO's main equations are:
Where x and v are the position and velocity, w, c1 and c2 are constants and r1 and r2 two random numbers. In summary, the PSO algorithm flows like that:
Start your population
Set the constants (w, c1 and c2)
Check stop criterion or convergence
Get the random numbers r1 and r2
Update Gbest and Pbest
Update v and x
Return to 3
The Pbest stores the best position, so far, for particle k and Gbest stores the best position for all particles. It's used to make all particles points to the global max/min. It's also affected by the topology:
If you use a global topology, it's easier to get stuck in a local min/max. On the other hand, your algorithm may converge faster. So, this depends on your problem and you need to test.

Related

How to align two Point Clouds given the set of points and point-to-point correspondence?

Suppose I have two pointclouds [x1, x2, x3...] and [y1, y2, y3...]. These two pointclouds should be as close as possible. There are a lot of algorithms and deep learning techniques for the pointcloud registration problems. But I have the extra information that: points x1 and y1 should be aligned, x2 and y2 should be aligned, and so on.
So the order of the points in both point clouds is the same. How can I use this to properly get the transformation matrix to align these two-point clouds?
Note: These two points clouds are not exactly the same. Actually, I had ground truth point cloud [x1,x2,x3...] and I tried to reconstruct another point cloud as [y1,y2,y3...]. Now I want to match them and visualize them if the reconstruction is good or not.
The problem you are facing is an overdetermined system of equations, which is solvable with a closed-form expression. No need for iterative methods like ICP, since you have the correspondence between points.
If you're looking for a rigid transform (that allows scaling, rotation and translation but no shearing), you want Umeyama's algorithm [3], which is a closed form as well, there is a Python implementation here: https://gist.github.com/nh2/bc4e2981b0e213fefd4aaa33edfb3893
If you are looking for an affine transform between your point clouds, i.e a linear transform A (that allows shearing, see [2]) as well as a translation t (which is not linear):
Then, each of your points must satisfy the equation:
y = Ax + t.
Here we assume the following shapes: y:(d,n), A:(d,d), x:(d,n), t:(d,1) if each cloud has n points in R^d.
You can also write it in homogeneous notation, by adding an extra coordinate, see [1]. This results in a linear system y=Mx, and a lot (assuming n>d) of pairs (x,y) that satisfy this equation (i.e. an overdetermined system).
You can therefore solve this using a closed-form least square method:
# Inputs:
# - P, a (n,dim) [or (dim,n)] matrix, a point cloud of n points in dim dimension.
# - Q, a (n,dim) [or (dim,n)] matrix, a point cloud of n points in dim dimension.
# P and Q must be of the same shape.
# This function returns :
# - Pt, the P point cloud, transformed to fit to Q
# - (T,t) the affine transform
def affine_registration(P, Q):
transposed = False
if P.shape[0] < P.shape[1]:
transposed = True
P = P.T
Q = Q.T
(n, dim) = P.shape
# Compute least squares
p, res, rnk, s = scipy.linalg.lstsq(np.hstack((P, np.ones([n, 1]))), Q)
# Get translation
t = p[-1].T
# Get transform matrix
T = p[:-1].T
# Compute transformed pointcloud
Pt = P#T.T + t
if transposed: Pt = Pt.T
return Pt, (T, t)
Opencv has a function called getAffineTransform(), however it only takes 3 pairs of points as input. https://theailearner.com/tag/cv2-getaffinetransform/. This won't be robust for your case (if e.g. you give it the first 3 pairs of points).
References:
[1] https://web.cse.ohio-state.edu/~shen.94/681/Site/Slides_files/transformation_review.pdf#page=24
[2] https://docs.opencv.org/3.4/d4/d61/tutorial_warp_affine.html
[3] https://stackoverflow.com/a/32244818/4195725
As another user already mentioned, the ICP algorithm (implementation in PCL can be found here) can be used to register two point clouds to each other. However this only works locally, so the clouds have to be aligned first.
I don't think there is a global registration in PCL at the moment, but I've used OpenGR which has a PCL wrapper.
If you know for sure that x1 is near y1, x2 is near y2 etc. you can do a manual alignment which will be a lot faster than global alignment:
Translate 2nd cloud by vector y1-x1
Rotate vector y2-y1 into vector x2-x1
Then refine it using ICP.
This does not account for measurement errors, so using the matrix estimation above will be useful if your data is not 100% correct.
VTK's vtkLandmarkTransform also does the same thing, with support for RigidBody/Similarity/Affine transformation:
// need at least four pairs of points in sourcePoint and targetPoints,
// can pick more, but probably not too many
vtkLandmarkTransform landmarkTransform = new vtkLandmarkTransform();
landmarkTransform.SetSourceLandmarks(sourcePoints); // source is to be transformed to match the target
landmarkTransform.SetTargetLandmarks(targetPoints); // target stays still
landmarkTransform.SetMode(VTK_Modes.AFFINE);
landmarkTransform.Modified(); // do the calculation
landmarkTransform.GetMatrix(mtx);
// now you can apply the mtx to all points

Machine Learning: Why xW+b instead of Wx+b?

I started to learn Machine Learning. Now i tried to play around with tensorflow.
Often i see examples like this:
pred = tf.add(tf.mul(X, W), b)
I also saw such a line in a plain numpy implementation. Why is always x*W+b used instead of W*x+b? Is there an advantage if matrices are multiplied in this way? I see that it is possible (if X, W and b are transposed), but i do not see an advantage. In school in the math class we always only used Wx+b.
Thank you very much
This is the reason:
By default w is a vector of weights and in maths a vector is considered a column, not a row.
X is a collection of data. And it is a matrix nxd (where n is the number of data and d the number of features) (upper case X is a matrix n x d and lower case only 1 data 1 x d matrix).
To correctly multiply both and use the correct weight in the correct feature you must use X*w+b:
With X*w you mutliply every feature by its corresponding weight and by adding b you add the bias term on every prediction.
If you multiply w * X you multipy a (1 x d)*(n x d) and it has no sense.
I'm also confused with this. I guess this may be a dimension matter. For a n*m-dimension matrix W and a n-dimension vector x, using xW+b can be easily viewed as that maping a n-dimension feature to a m-dimension feature, i.e., you can easily think W as a n-dimension -> m-dimension operation, where as Wx+b (x must be m-dimension vector now) becomes a m-dimension -> n-dimension operation, which looks less comfortable in my opinion. :D

Cluster adjacent points

I have a sequence of xy planes with integer coordinates and each one has points scattered differently over it.
For each plane I would like perform clustering of the points, putting in the same cluster a point that is far from another point in the cluster by less than d (or exactly d).
For example if there is a point P1(x,y) in the cluster and d=1 also
P2(x+1,y)
P3(x,y+1)
P4(x+1,y+1)
P5(x-1,y)
P6(x,y-1)
P7(x-1,y-1)
P8(x+1,y-1)
P9(x-1,y+1)
will fall in the same cluster. Graphically:
P9 P3 P4
\ | /
P5 - P1 - P2
/ | \
P7 P6 P8
Which clustering algorithm is best suited for this task?
This is not so much a clustering problem, but you have your neighbor relation,
and you want to compute the transitive closure of this neighbor relation.
This is a much simpler problem, and it has an obvious and efficient straightforward solution (breadth-first search):
While there are unprocessed points:
Initialize a new empty result set.
Working set = choose any one (!) unprocessed point
While the working set is not empty:
Add working set to result set
Mark all points in the working set as processed
Working set = all ''unprocessed'' neighbors of the previous working set
Return the result set as new group.

How to design an O(m) time algorithm to compute the shortest cycle of G(undirected unweighted graph) that contains s?

How to design an O(m) time algorithm to compute the shortest cycle of G(undirected unweighted graph) that contains s(s ∈ V) ?
You can run a BFS from your node s as starting point, this will give you a BFS-tree. Afterwards you can built a lowest-common-ancestor (LCA) data structure on this BFS-tree. This can be done for example with Tarjan's lowest-common-ancestor algorithm. I will not got into details here. Given two nodes v and w, LCA lets you find the lowest node in a tree (the BFS-tree in our case) that has v and w as descendents. The idea is when you are considering two nodes that are connected in our BFS-tree you want to check if their paths to the root (s is this case) + the edge that connects them forms a cycle (with s). This is the case if their LCA is s.
Assuming you have built the LCA, you run a second BFS. When expanding the neighbours of a node v, you also take into consideration the nodes already marked as explored. Suppose x is a neighbour of v such that x has already been explored. If the LCA of v and x is s then the path from x to s and form v to s in the BFS-tree plus the edge xv forms a cycle. The first x and v that you encounter in your second BFS gives you the desired result. If no such x exist then s is not contained in any cycle.
The cycle is also the shortest containing s.
The two BFS run in O(m) and the LCA construction can also be done in linear time, hence the whole procedure can be implemented in O(m).
This might a bit overkill. There surely is a much simpler solution.

Feedback on algorithm for Steiner Tree with restrictions

For an assignment, I have to create a Steiner Tree. However, this is not a typical Steiner Tree, as the graph structure we're required to use does not allow insertion of new vertices. Rather, the test cases define a graph structure of N vertices and M edges while specifically marking X vertices as target nodes. These are the nodes we have to span while using some, none or all of the unmarked vertices in the graph.
My solution to this problem is
Implement Dijkstra's Algorithm to find the shortest path between all the target vertices
For each of the shortest paths 1:n
Extract all current selected path vertices into a set
Extract all remaining vertices into a set
For all vertices of the current selected path 1:m
Execute Dijkstra to find shortest path between current vertex and other path's vertices
If this creates a spanning tree, save path and length in priority queue sorted by length value
Pop top of priority queue and return path
My issue is that this is an exhaustive search that uses the initial application of Dijkstra to create a reduced set of possible start-end vertices for a shorter path than a minimum spanning tree.
Is there a heuristic or other algorithm that may solve this problem?
With some help, I worked out this answer for a similar problem that I had. Rather than adding new vertices as in a spacial steiner tree problem, the new steiner points in this graph are the vertices that lie along the path between the marked nodes. For a graph with N vertices, M edges, X require vertices, and S found vertices (vertices along our path):
Compute All Pairs Shorest Paths (Floyd-Warshall, Johnson's, whatever)
for k in X
remove k from X, insert k into S
for v in (X + S) - Both sets
find the shortest distance from k to v - path P
for u in P (all vertices on the path)
insert u into S
if u exists in k, remove u from k
Now for the wall of text as to what this algorithm does. We pick a vertex k in X, and then find the minimum distance to the nearest other vertex in the target set X, or in the result set S, and call it v. Then we follow the path of nodes from {k,u}, inserting them into our result set. Finally, double check and make sure that any vertices in X that were on the path (shouldn't happen) are removed from X.
Any new vertex that you want to add, c, will have a minimum distance to some node already in your result set S. Since the nodes already in S are the minimum distance apart, it follows that c will be the minimum distance from any point in S to c. For example, if you have three nodes, A, B, and C, if A and B are already found to be a minimum distance apart, adding C fulfills the requirement that it is the minimum distance from B, and the minimum distance path from A to C goes through B.
I did some research on the discrete Steiner Tree problem (which is what this is), and this is the best brute force solution that I found. The main problem is going to be the O(n^3) time it takes to do all pairs shortest paths, but then the construction of the minimum tree should be straightforward and quick, since you just need to look up distance information. The implementation I wound up working with is outlined nicely on wikipedia.

Resources