If I change any parameter like clustering coefficient, average shortest path length, it is not showing any effect on the visualization of the network.
That is correct. If you want the network to reflect a parameter like that, you need to use an edge-weighted layout and then select that column as the weight. Note that if you manually change the value, you will need to relayout the network.
-- scooter
Related
I have an accident location dataset. I have applied several clustering algorithms on this dataset using the column latitude and longitude. Now I would like to measure the accuracy of different clustering algorithms separately to compare between them.
I want to apply the confusion matrix described in this article.
But I am not able to understand what I should consider as a label? I have made my clusters using only two columns latitude and longitude. Can anyone guide me, please? I have the code but it's not clear to me. I mean what is the label or class label in my case?
In a confusion matrix you provide two sets of labels for each entry. One of these labels is the cluster assignment generated by the clustering you did. The second label can be the ground truth, which allows you to determine accuracy/precision.
Your case sounds like there is no ground truth, so you can't compare for accuracy. You CAN use the result of one of the different algorithms you used as the second set of labels, to compare the result between these two clusterings.
Why is it wrong to think that it only needs the data since it: "outputs a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation)."
However, I also need to input the labels (which the function itself computes); so, why are the labels necessary to input?
how similar an object is to its own cluster
In order to compute the silhouette, you need to know to which cluster your samples belong.
Also:
The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of.
You need the labels to know what "intra-cluster" and "nearest-cluster" mean.
Silhouette_score is a metric for clustering quality, not a clustering algorithm. It considers both the inter-class and intra-class distance.
For that calculation to happen, you need to supply both the data and target labels (estimated by unsupervised methods like K-means).
I have a bunch of gray-scale images decomposed into superpixels. Each superpixel in these images have a label in the rage of [0-1]. You can see one sample of images below.
Here is the challenge: I want the spatially (locally) neighboring superpixels to have consistent labels (close in value).
I'm kind of interested in smoothing local labels but do not want to apply Gaussian smoothing functions or whatever, as some colleagues suggested. I have also heard about Conditional Random Field (CRF). Is it helpful?
Any suggestion would be welcome.
I'm kind of interested in smoothing local labels but do not want to apply Gaussian smoothing functions or whatever, as some colleagues suggested.
And why is that? Why do you not consider helpful advice of your colleagues, which are actually right. Applying smoothing function is the most reasonable way to go.
I have also heard about Conditional Random Field (CRF). Is it helpful?
This also suggests, that you should rather go with collegues advice, as CRF has nothing to do with your problem. CRF is a classifier, sequence classifier to be exact, requiring labeled examples to learn from and has nothing to do with the setting presented.
What are typical approaches?
The exact thing proposed by your collegues, you should define a smoothing function and apply it to your function values (I will not use a term "labels" as it is missleading, you do have values in [0,1], continuous values, "label" denotes categorical variable in machine learning) and its neighbourhood.
Another approach would be to define some optimization problem, where your current assignment of values is one goal, and the second one is "closeness", for example:
Let us assume that you have points with values {(x_i, y_i)}_{i=1}^N and that n(x) returns indices of neighbouring points of x.
Consequently you are trying to find {a_i}_{i=1}^N such that they minimize
SUM_{i=1}^N (y_i - a_i)^2 + C * SUM_{i=1}^N SUM_{j \in n(x_i)} (a_i - a_j)^2
------------------------- - --------------------------------------------
closeness to current constant to closeness to neighbouring values
values weight each part
You can solve the above optimization problem using many techniques, for example through scipy.optimize.minimize module.
I am not sure that your request makes any sense.
Having close label values for nearby superpixels is trivial: take some smooth function of (X, Y), such as constant or affine, taking values in the range [0,1], and assign the function value to the superpixel centered at (X, Y).
You could also take the distance function from any point in the plane.
But this is of no use as it is unrelated to the image content.
I have a trained neural network which suitably maps my inputs to my outputs. Is it then possible to specify a desired y output and then use a gradient decent method to determine the optimum input values to get that output?
When using backpropegation, the partial derivative of a weight is used with error function to proportionally adjust the weights; is there a way to do something similar with the input values themselves and a target y value?
A neural network is basically a complex mathematical function. By adjusting the weights you basically adjust that function's parameters. Given that, your question is if you can easily and automatically invert the function. I don't think this can be done easily.
I think that the only thing you can do is to create another inverted network and train it with inverted data.
I am trying to detect a curve of a certain shape and its position from a signal as shown below:
(link to picture: http://tinypic.com/view.php?pic=ab5j45&s=6)
I would be getting the signal as an array of floats.
Due to noise and other variations, the curve may not be exact so I can not use simple number matching. I was wondering if there is something in OpenCV which I can use for this.
Note that I will need to detect curves of different shapes and their position in the signal but if I know to detect one type, I can use the same method to detect other types.
Regards,
Peter
I would try to define a parametric mathematical function representing the shape you want to match.
Then all you need to do is to apply a technique (For instance least squares) to get the values of the parameters that best matches the curve over your signal.
You may want to match your function against a sliding window, especially if you want to match multiple events in your signal.
Noise and "other variations" are high frecuencies, so you need to filter the signal with a low-pass filter (for filtering, use the convolution operation). It seems that you signal have very low frecuencies (below 5KHz maybe?). Alfter filtering, look at your signal, and when you get the desired curve shape, apply numerical matching.
Matched filter has the highest peak at the integer position in a signal that best matches the given shape (or energy of the pattern to be matched). But in addition to that, often the neighboring values of the matched filter output can be used to fine tune the position by calculating tau = a-b/(a+b) (IIRC), where a=peak value and b is the second best value.
This works especially well, if the signal to be matched has good auto correlation characteristics -- one high peak and close to zero at +-1 from the peak (basically means detecting pilot signals).