simple way to tell if MST will improve if a specific edge cost is reduced? - graph-algorithm

G is an undirected connected graph with positive costs on all edges. Given is edge e whose cost is strictly more than 10. We need to answer whether the MST cost will improve if the cost of e is reduced by 10.
I know of a solution that involves generating a new graph with only edges with cost<cost(e)-10. What's wrong with this much simpler solution:
Take one of e's vertices v. Find the minimal cost edge incident to v. Now reduce e's cost and find the minimal cost edge incident to v again. If there was a change, it means that prim would find a better MST and the cost is improved. If not, it means that prim would find the same MST and the cost stays the same.
What's wrong with this logic?
related to Update minimum spanning tree with modification of edge

I don't think that your solution is correct.
Consider the following graph G = (V, E), V = {a, b, c, d, e}, E = {ab, bc, cd, de, ae, bd} and the respective weights are {5, 10, 10, 5, 17}.
By running Kruskal or Prim, we find that our MST is {ab, bc, cd, de}, and his weight is 30.
Now, let's reduce the weight of the edge bd from 17 to 7, and examine the edges again.
Running Prim or Kruskal with G' will output an MST which weighs 27 (actually we have 2 such MSTs {ab, bd, de, cd} and {ab, bd, de, bc}).
But if we use your algorithm, we would get the same exact tree, because when we examine the nodes b or d, the edge bd is not the lightest edge that is adjacent to either one of these nodes.

Let's G = (V, E) be a graph.
where w(<u,v>) is the weight of <u,v>.
Lemma 1
Let's G be a graph, v a vertex of G and e an edge of G incident to v. If
w(e) = C(v) then e belongs to some MST of G.
It's true that if C(v) value is altered when e's cost is reduced by 10 then the MST cost will improve if the cost of e is reduced by 10 by lemma 1.
First half is ok. Let's take a look to second part.
If not, it means that prim would find the same MST and the cost stays the same.
General explanation
The aforementioned quote falsely implies that the converse of lemma 1 is true (e belongs to some MST of G then w(e) = C(v)) since it claims that if we reduce e's cost by 10 and w(e) != C(v) then MST cost is preserved which implies that e doesn't belong to any MST of G.
Short explanation: a counterexample
Let's G = ({1, 2, 3, 4}, {<1, 2>, <1, 3>, <2, 4>, <3, 4>, <1, 4>}) with weight function w(<1, 2>) = 1, w(<1, 3>) = 3, w(<2, 4>) = 3, w(<3, 4>) = 1, w(<1, 4>) = 12 and e = <1, 4>.
After reducing e's cost we know that C(1) = C(4) = 1 != w(e). Proposed algorithm state that: "prim would find the same MST and the cost stays the same".
Let's check if there is a decrease in G's MST cost when the cost of e is reduced by 10:
MST cost before reducing the cost of e by 10: 5
MST cost after reducing the cost of e by 10: 4
Since there is a decrease in the MST cost then such claim (quoted one) is false and proposed algorithm doesn't work.
Note: The algorithm is wrong no matter which MST algorithm is used as the counterproof relies only on MST properties.


Misconceptions about the Shannon-Nyquist theorem

I am a student working with time-series data which we feed into a neural network for classification (my task is to build and train this NN).
We're told to use a band-pass filter of 10 Hz to 150 Hz since anything outside that is not interesting.
After applying the band-pass, I've also down-sampled the data to 300 samples per second (originally it was 768 Hz). My understanding of the Shannon Nyquist sampling theorem is that, after applying the band-pass, any information in the data will be perfectly preserved at this sample-rate.
However, I got into a discussion with my supervisor who claimed that 300 Hz might not be sufficient even if the signal was band-limited. She says that it is only the minimum sample rate, not necessarily the best sample rate.
My understanding of the sampling theorem makes me think the supervisor is obviously wrong, but I don't want to argue with my supervisor, especially in case I'm actually the one who has misunderstood.
Can anyone help to confirm my understanding or provide some clarification? And how should I take this up with my supervisor (if at all).
The Nyquist-Shannon theorem states that the sampling frequency should at-least be twice of bandwidth, i.e.,
fs > 2B
So, this is the minimal criteria. If the sampling frequency is less than 2B then there will be aliasing. There is no upper limit on sampling frequency, but more the sampling frequency, the better will be the reconstruction.
So, I think your supervisor is right in saying that it is the minimal condition and not the best one.
Actually, you and your supervisor are both wrong. The minimum sampling rate required to faithfully represent a real-valued time series whose spectrum lies between 10 Hz and 150 Hz is 140 Hz, not 300 Hz. I'll explain this, and then I'll explain some of the context that shows why you might want to "oversample", as it is referred to (spoiler alert: Bailian-Low Theorem). The supervisor is mixing folklore into the discussion, and when folklore is not properly-contexted, it tends to telephone tag into fakelore. (That's a common failing even in the peer-reviewed literature, by the way). And there's a lot of fakelore, here, that needs to be defogged.
For the following, I will use the following conventions.
There's no math layout on Stack Overflow (except what we already have with UTF-8), so ...
a^b denotes a raised to the power b.
∫_I (⋯x⋯) dx denotes an integral of (⋯x⋯) taken over all x ∈ I, with the default I = ℝ.
The support supp φ (or supp_x φ(x) to make the "x" explicit) of a function φ(x) is the smallest closed set containing all the x-es for which φ(x) ≠ 0. For regularly-behaving (e.g. continuously differentiable) functions that means a union of closed intervals and/or half-rays or the whole real line, itself. This figures centrally in the Shannon-Nyquist sampling theorem, as its main condition is that a spectrum have bounded support; i.e. a "finite bandwidth".
For the Fourier transform I will use the version that has the 2π up in the exponent, and for added convenience, I will use the convention 1^x = e^{2πix} = cos(2πx) + i sin(2πx) (which I refer to as the Ramanujan Convention, as it is the convention I frequently used in my previous life oops I mean which Ramanujan secretly used in his life to make the math a whole lot simpler).
The set ℤ = {⋯, -2, -1, 0, +1, +2, ⋯ } is the integers, and 1^{x+z} = 1^x for all z∈ℤ - making 1^x the archetype of a periodic function whose period is 1.
Thus, the Fourier transform f̂(ν) of a function f(t) and its inverse are given by:
f̂(ν) = ∫ f(t) 1^{-νt} dt, f(t) = ∫ f̂(ν) 1^{+νt} dν.
The spectrum of the time series given by the function f(t) is the function f̂(ν) of the cyclic frequency ν, which is what is measured in Hertz (Hz.); t, itself, being measured in seconds. A common convention is to use the angular frequency ω = 2πν, instead, but that muddies the picture.
The most important example, with respect to the issue at hand, is the Fourier transform χ̂_Ω of the interval function given by χ_Ω(t) = 1 if t ∈ [-½Ω,+½Ω] and χ_Ω(t) = 0 else:
χ̂_Ω(t) = ∫_[-½Ω,+½Ω] 1^ν dν
= {1^{+½Ω} - 1^{-½Ω}}/{2πi}
= {2i sin πΩ}/{2πi}
= Ω sinc πΩ
which is where the function sinc x = (sin πx)/(πx) comes into play.
The cardinal form of the sampling theorem is that a function f(t) can be sampled over an equally-spaced sampled domain T ≡ { kΔt: k ∈ ℤ }, if its spectrum is bounded by supp f̂ ⊆ [-½Ω,+½Ω] ⊆ [-1/(2Δt),+1/(2Δt)], with the sampling given as
f(t) = ∑_{t'∈T} f(t') Ω sinc(Ω(t - t')) Δt.
So, this generally applies to [over-]sampling with redundancy factors 1/(ΩΔt) ≥ 1. In the special case where the sampling is tight with ΩΔt = 1, then it reduces to the form
f(t) = ∑_{t'∈T} f(t') sinc({t - t'}/Δt).
In our case, supp f̂ = [10 Hz., 150 Hz.] so the tightest fits are with 1/Δt = Ω = 300 Hz.
This generalizes to equally-spaced sampled domains of the form T ≡ { t₀ + kΔt: k ∈ ℤ } without any modification.
But it also generalizes to frequency intervals supp f̂ = [ν₋,ν₊] of width Ω = ν₊ - ν₋ and center ν₀ = ½ (ν₋ + ν₊) to the following form:
f(t) = ∑_{t'∈T} f(t') 1^{ν₀(t - t')} Ω sinc(Ω(t - t')) Δt.
In your case, you have ν₋ = 10 Hz., ν₊ = 150 Hz., Ω = 140 Hz., ν₀ = 80 Hz. with the condition Δt ≤ 1/140 second, a sampling rate of at least 140 Hz. with
f(t) = (140 Δt) ∑_{t'∈T} f(t') 1^{80(t - t')} sinc(140(t - t')).
where t and Δt are in seconds.
There is a larger context to all of this. One of the main places where this can be used is for transforms devised from an overlapping set of windowed filters in the frequency domain - a typical case in point being transforms for the time-scale plane, like the S-transform or the continuous wavelet transform.
Since you want the filters to be smoothly-windowed functions, without sharp corners, then in order for them to provide a complete set that adds up to a finite non-zero value over all of the frequency spectrum (so that they can all be normalized, in tandem, by dividing out by this sum), then their respective supports have to overlap.
(Edit: Generalized this example to cover both equally-spaced and logarithmic-spaced intervals.)
One example of such a set would be filters that have end-point frequencies taken from the set
Π = { p₀ (α + 1)ⁿ + β {(α + 1)ⁿ - 1} / α: n ∈ {0,1,2,⋯} }
So, for interval n (counting from n = 0), you would have ν₋ = p_n and ν₊ = p_{n+1}, where the members of Π are enumerated
p_n = p₀ (α + 1)ⁿ + β {(α + 1)ⁿ - 1} / α,
Δp_n = p_{n+1} - p_n = α p_n + β = (α p₀ + β)(α + 1)ⁿ,
n ∈ {0,1,2,⋯}
The center frequency of interval n would then be ν₀ = p_n + ½ Δp₀ (α + 1)ⁿ and the width would be Ω = Δp₀ (α + 1)ⁿ, but the actual support for the filter would overlap into a good part of the neighboring intervals, so that when you add up the filters that cover a given frequency ν the sum doesn't drop down to 0 as ν approaches any of the boundary points. (In the limiting case α → 0, this produces an equally-spaced frequency domain, suitable for an equalizer, while in the case β → 0, it produces a logarithmic scale with base α + 1, where octaves are equally-spaced.)
The other main place where you may apply this is to time-frequency analysis and spectrograms. Here, the role of a function f and its Fourier transform f̂ are reversed and the role of the frequency bandwidth Ω is now played by the (reciprocal) time bandwidth 1/Ω. You want to break up a time series, given by a function f(t) into overlapping segments f̃(q,λ) = g(λ)* f(q + λ), with smooth windowing given by the functions g(λ) with bounded support supp g ⊆ [-½ 1/Ω, +½ 1/Ω], and with interval spacing Δq much larger than the time sampling Δt (the ratio Δq/Δt is called the "hop" factor). The analogous role of Δt is played, here, by the frequency interval in the spectrogram Δp = Ω, which is now constant.
Edit: (Fixed the numbers for the Audacity example)
The minimum sampling rate for both supp_λ g and supp_λ f(q,λ) is Δq = 1/Ω = 1/Δp, and the corresponding redundancy factor is 1/(ΔpΔq). Audacity, for instance, uses a redundancy factor of 2 for its spectrograms. A typical value for Δp might be 44100/2048 Hz., while the time-sampling rate is Δt = 1/(2×3×5×7)² second (corresponding to 1/Δt = 44100 Hz.). With a redundancy factor of 2, Δq would be 1024/44100 second and the hop factor would be Δq/Δt = 1024.
If you try to fit the sampling windows, in either case, to the actual support of the band-limited (or time-limited) function, then the windows won't overlap and the only way to keep their sum from dropping to 0 on the boundary points would be for the windowing functions to have sharp corners on the boundaries, which would wreak havoc on their corresponding Fourier transforms.
The Balian-Low Theorem makes the actual statement on the matter.
And a shout-out to someone I've been talking with, recently, about DSP-related matters and his monograph, which provides an excellent introductory reference to a lot of the issues discussed here.
A Friendly Guide To Wavelets
Gerald Kaiser
Birkhauser 1994
He said it's part of a trilogy, another installment of which is forthcoming.

Compute similarity between n entities

I am trying to compute the similarity between n entities that are being described by entity_id, type_of_order, total_value.
An example of the data might look like:
NR entity_id type_of_order total_value
1 1 A 10
2 1 B 90
3 1 C 70
4 2 B 20
5 2 C 40
6 3 A 10
7 3 B 50
8 3 C 20
9 4 B 50
10 4 C 80
My question would be what is a god way of measuring the similarity between entity_id 1 and 2 for example with regards to the type_of_order and the total_value for that type of order.
Would a simple KNN give satisfactory results or should I consider other algorithms?
Any suggestion would be much appreciated.
The similarity metric is a heuristic to capture a relationship between two data rows, with respect to the data semantics and the purpose of the training. We don't know your data; we don't know your usage. It would be irresponsible to suggest metrics to solve a problem when we have no idea what problem we're solving.
You have to address this question to the person you find in the mirror. You've given us three features with no idea of what they mean or how they relate. You need to quantify ...
relative distances within features: under type_of_order, what is the relationship (distance) between any two measurements? If we arbitrarily assign d(A, B) = 1, then what is d(B, C)? We have no information to help you construct this. Further, if we give that some value c, then what is d(A, C)? In various popular metrics, it could be 1+c, |1-c|, all distances could be 1, or perhaps it's something else -- even more than 1+c in some applications.
Even in the last column, we cannot assume that d(10, 20) = d(40, 50); the actual difference could be a ratio, difference of squares, etc. Again, this depends on the semantics behind these labels.
relative weights between features: How do the differences in the various columns combine to provide a similarity? For instance, how does d([A, 10], [B, 20]) compare to d([A, 10], [C, 30])? That's two letters in the left column, two steps of 10 in the right column. How about d([A, 10], [A, 20]) vs d([A, 10], [B, 10])? Are the distances linear, or do the relationships change as we slide up the alphabet or to higher numbers?

optimal separating hyperplane objective function confusion

Chapter 4.5.2 of Elements of Statistical Learning
I don't understand what does it mean:
"Since for any β and β0 satisfying these inequalities, any positively scaled
multiple satisfies them too, we can arbitrarily set ||β|| = 1/M." 
Also, how does maximize M becomes minimize 1/2(||β||^2) ?
"Since for any β and β0 satisfying these inequalities, any positively scaled multiple satisfies them too, we can arbitrarily set ||β|| = 1/M." 
y_i(x_i' b + b0) >= M ||b||
thus for any c>0
y_i(x_i' [bc] + [b0c]) >= M ||bc||
thus you can always find such c that ||bc|| = 1/M, so we can focus only on b such that they have such norm (we simply limit the space of possible solutions because we know that scaling does not change much)
Also, how does maximize M becomes minimize 1/2(||β||^2) ?
We put ||b|| = 1/M, thus M=1/||b||
max_b M = max_b 1 / ||b||
now maximization of positive f(b) is equivalent of minimization of 1/f(b), so
min ||b||
and since ||b|| is positive, its minimization is equivalent to minimization of the square, as well as multiplied by 1/2 (this does not change the optimal b)
min 1/2 ||b||^2

How to solve this recursion equation T (n) = √2T(n/2) + log n using master theorem?

I know it can be solved with master method but how ? please help ?
i am not sure if this is correct:
a = sqrt(2)
b = 2
f(n) = log n
log(b) a = log (2) sqrt(2) = 1/2
log n in O[n^(1/2)]
so the runtime of finding the logarithm of a number n is in O{n^(1/2)} (however the Master Theorem can not be applied here)
The solution is in following threads:Solving master theorem with log n: T(n) = 2T(n/4) + log n
Overall, we see that your recurrence is O(n1/2) and Ω(n1/2) by upper- and lower-bounding your recurrence by larger and smaller recurrences. Therefore, even though the Master Theorem doesn't apply here, you can still use the Master Theorem to claim that the runtime will be Θ(n1/2).
Master's theorem with f(n)=log n
Usually, f(n) must be polynomial for the master theorem to apply - it doesn't apply for all functions. However, there is a limited "fourth case" for the master theorem, which allows it to apply to polylogarithmic functions.
Ralf is not correct by telling you that you can't apply masters theorem.
The only constrains here is that a >=1 and b >= 1, a, b can be irrational and f(n) can be anything.
Log2(sqrt(2)) is 1/2, which puts you in the first case and your solution is O(n^0.5).

data visualization. 3D, precison, recall, and f-measure. maybe using ocatve?

I've been running a machine learning algorithm, I have output in the form of Precision, Recall, and F-Measure.
I'd like to graph this data so I can get a clearer conception of how things are really going, but I don't really know how to do that. I suppose I can use Octave? I heard about it in that Andrew Ng course and I've already got it on my machine, but I don't really know how to use it to visualize data.
Does anyone with experience in this know how I might best proceed or some helpful resources on the best way to go about this?
0.011723329425556858 P 0.6000000238418579 R 0.010416666977107525 F1 0.02047781631341665
0.012895662368112544 P 0.6363636255264282 R 0.01215277798473835 F1 0.023850085569817648
0.01406799531066823 P 0.6666666865348816 R 0.013888888992369175 F1 0.027210884568890845
0.015240328253223915 P 0.6153846383094788 R 0.013888888992369175 F1 0.02716468612858015
0.016412661195779603 P 0.6428571343421936 R 0.015625 F1 0.03050847456668239
0.017584994138335287 P 0.6000000238418579 R 0.015625 F1 0.03045685282259509
0.01875732708089097 P 0.5625 R 0.015625 F1 0.030405405405405407
0.01992966002344666 P 0.529411792755127 R 0.015625 F1 0.030354131580674088
0.021101992966002344 P 0.5555555820465088 R 0.0173611119389534 F1 0.03367003527554599
0.022274325908558032 P 0.5263158082962036 R 0.0173611119389534 F1 0.03361344696816966
0.023446658851113716 P 0.5 R 0.0173611119389534 F1 0.033557048526295
0.0246189917936694 P 0.4761904776096344 R 0.0173611119389534 F1 0.03350083906570289
I suppose the first column is some threshold you varied between lines.
The precision-recall graph is precision-vs-recall. Thus we can first retrieve those two columns from your data: (suppose your data are saved in
cat | awk '{print $3,$5}'
You will get below two columns only and you can initialize a 2d matrix in octave:
data = [
0.6000000238418579 0.010416666977107525
0.6363636255264282 0.01215277798473835
0.6666666865348816 0.013888888992369175
0.6153846383094788 0.013888888992369175
0.6428571343421936 0.015625
0.6000000238418579 0.015625
0.5625 0.015625
0.529411792755127 0.015625
0.5555555820465088 0.0173611119389534
0.5263158082962036 0.0173611119389534
0.5 0.0173611119389534
0.4761904776096344 0.0173611119389534];
Then under octave, below command will print each row as a data point in the graph:
plot(data(:,2), data(:,1), 'x')
Looks like with some threshold increase, you are decreasing precision and the recall stays the same (for example, when threshold = 0.021, 0.022, 0.023, 0.024).
