CPU parallelization of Mosek solver

CPU parallelization of Mosek solver - drake

How do I speed up Mosek? I am doing some alternating SDP optimizations (Lyapunov problem), but each time my region of attraction only increases by .01.... so it is very slow.
Perhaps if each iteration could be sped up with parallel processing, it could be better?
One iteration takes about 1-2 seconds.

Related

What is the purpose of decreased FLOPs and parameter size if they are not for increased speed?

CNN algorithms like DenseNet DenseNet stress parameter efficiency, which usually results in less FLOPs. However, what I am struggling to understand is why this is important. For DenseNet, in particular, it has low inference speed. Isn't the purpose of decreased parameter size/FLOPs to decrease the time for inference? Is there another real world reason, such as perhaps less energy used, for these optimizations?

There is a difference between overall inference time vs. per parameter/FLOPs training efficiency. Having lower parameter/FLOPs in training does not guarantee higher speed in inference. Because overall inference depends on the architecture and how predictions are computed.

Fine-tuning VGG last layer is very slow

I am fine-tuning a VGG16 network on 32 cpu machine using tensorflow. I used cross entropy loss with sparse. I have to classify the cloths images into 50 classes. After 2 weeks of training this is how the loss is going down, which I feel is very slow convergence. My batch size is 50. Is it normal or what do you think is going wrong here? Accuracy is also really bad. And now it crashed with bad memory allocation error.
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_allo
My last line in log file looks like this -
2016-12-13 08:56:57.162186: step 31525, loss = 232179.64 (1463843.280 sec/batch)
I also tried Tesla K80 GPU and after 20 hrs of training this is how the loss looks like. All parameters are same. Worrying part is - using GPU didn't increase the iteration rate which means each step is taking same time either in 32 cpu with 50 threds or in tesla K80.
I definitely need some practical advice here.

Another -- and drastically better -- option is to not use VGG16. If you look at Figure 5 in this paper, you'll note that VGG16 does very badly in terms of accuracy vs. FLOPs (floating point operations per second). If you need speed, Mobilenet or a reduced-size ResNet will do much better. Even inception-v2 will outperform VGG in accuracy with much lower computational cost.
This will drastically reduce your training time and memory use.

What limits data rate through a medium keep on increasing?

We know data rate is bits per second. It can be also considered as baud rate(symbols per second) times the number of bits in symbol. So, if to increase data rate, we can increase baud rate or we can increase number of bits in a symbol. Why can't we keep on increasing these two? Can someone explain what happens with these 2 occasions separately?

This is essentially a physics question. We can play all sorts of games with how to physically represent a signal (hence, getting more bits per baud), but at the end of the day you can only physically convey so much information for any given rate of change of a signal. If you want to communicate faster, you have to up the frequency, which means having signals that change faster in time -- and nature ultimately limits how fast you can change the signal.
See:
http://en.wikipedia.org/wiki/Nyquist_rate
This gets even worse when you add noise:
http://en.wikipedia.org/wiki/Shannon%E2%80%93Hartley_theorem

When performing an FFT on a signal, does information about the relevant pass-band allow the algorithm to be more efficient?

Often times in signal processing discussions people talk about the number of points of the fft (eg, 512, 1024, 2048), and they also talk about the number of bits of the signal. Another important part of the discussion should be the signals of interest. For example, if one is really interested in signals below 60Hz, it seems wasteful for an FFT algorithm to test for power at higher frequencies (and Fourier coefficients?). Is this the case in common implementations of the FFT algorithm? This savings could be quite relevant to someone performing an FFT on a low-powered microcontroller.

You could low-pass filter, decimate, and use a shorter FFT. But if the cost of quality filtering is a large fraction of NLogN, it (plus the shorter FFT) may cost as much as just doing the longer FFT and throwing away unneeded result bins.
You could use a Goertzel filter for just the needed DFT result frequency bins, but again, if you need around LogN result bins or more, an optimized full FFT may cost less computation (and also be slightly more accurate). So this is mainly useful if you need a lot less result bins than log(N), such as with DTMF decoding using a slow microcontroller.

Phase difference between two signals?

I'm working on this embedded project where I have to resonate the transducer by calculating the phase difference between its Voltage and Current waveform and making it zero by changing its frequency. Where I(current) & V(Voltage) are the same frequency signals at any instant but not the fixed frequency signals approx.(47Khz - 52kHz). All I have to do is to calculate phase difference between these two signals. Which method will be most effective.
FFT of Two signals and then phase difference between the specific components
Or cross-correlation of two signals?
Or another if any ? Which method will give me most accurate result ? and with what resolution? Does sampling rate affects phase difference's resolution (minimum phase difference which can be sensed) ?
I'm new to Digital signal processing, in case of any mistake, correct me.
ADDITIONAL DETAILS:-
Noise In my system can be white/Gaussian Noise(Not significant) & Harmonics of Fundamental (Which might be significant one in resonant mismatch case).
Yes 4046 can be a good alternative with switching regulators. I'm working with (NCO/DDS) where I can scale/ reshape sinusoidal on ongoing basis.
Implementation of Analog filter will be very complex as I will require higher order filter with high roll-off rate for harmonic removal , so I'm choosing DSP based filter and its easy to work with MATLAB DSP Processors.
What sampling rate would you suggest for a ~50 KHz (47Khz-52KHz) system for achieving result in FFT or Goertzel with phase resolution of preferably =<0.1 degrees or less and frequency steps will vary from as small as ~1 to 2Hz . to 50 Hz-200Hz.
My frequency is variable 45KHz - 55Khz ... But will be known to my system... Knowing phase error for the last fed frequency is more desirable. After FFT AND DIGITAL FILTERING , IFFT can be performed for more noise free samples which can be used for further processing. So i guess FFT do both the tasks ...
But I'm wondering about the Phase difference accuracy cause thats the crucial part.

The Goertzel algorithm http://www.embedded.com/design/configurable-systems/4024443/The-Goertzel-Algorithm is a fairly efficient tone detection method that resolves the signal into real and imaginary components. I'll assume you can do the numeric to get the phase difference or just polarity, as you require.
Resolution versus time constant is a design tradeoff which this article highlights issues. http://www.mstarlabs.com/dsp/goertzel/goertzel.html
Additional
"What accuracy can be obtained?"
It depends...upon what you are faced with (i.e., signal levels, external noise, etc.), what hardware you have (i.e., adc, processor, etc.), and how you implement your solution (sample rate, numerical precision, etc.). Without the complete picture, I'll be guessing what you could achieve as the Goertzel approach is far from easy.
But I imagine for a high school project with good signal levels and low noise, an easier method of using the phase comparator (2 as it locks at zero degrees) of a 4046 PLL www.nxp.com/documents/data_sheet/HEF4046B.pdf will likely get you down to a few degrees.
One other issue if you have a high Q transducer is generating a high-resolution frequency. There is a method but that's another avenue.
Yet more
"Harmonics of Fundamental (Which might be significant)"... hmm hence the digital filtering;
but if the sampling rate is too low then there might be a problem with aliasing. Also, mismatched anti-aliasing filters are likely to take your whole error budget. A rule of thumb of ten times sampling frequency seems a bit low, and it being higher it will make the filter design easier.
Spatial windowing addresses off-frequency issues along with higher roll-off and attenuation and is described in this article. Sliding Spectrum Analysis by Eric Jacobsen and Richard Lyons in Streamlining Digital Signal Processing http://www.amazon.com/Streamlining-Digital-Signal-Processing-Guidebook/dp/1118278380
In my previous project after detecting either carrier, I then was interested in the timing of the frequency changes in immense noise. With carrier phase generation inconstancies, the phase error was never quiescent to be quantified, so I can't guess better than you what you might get with your project conditions.

Not to detract from chip's answer (I upvoted it!) but some other options are:
Cross correlation. Off the top of my head, I am not sure what the performance difference between that and the Goertzel algorithm will be, but both should be doable on an embedded system.
Ad-hoc methods. For example, I would try something like this: bandpass the signals to eliminate noise, find the peaks and measure the time difference between the peaks. This will probably be more efficient, and, provided you do a reasonable job throwing out outliers and handling wrap-around, should be extremely robust. The bandpass filters will, themselves, alter the phase, so you'll have to make sure you apply exactly the same filter to both signals.

If the input signal-to-noise ratios are not too bad, a computually efficient solution can be built based on zero crossing detection. Also, have a look at http://www.metrology.pg.gda.pl/full/2005/M&MS_2005_427.pdf for a nice comparison of phase difference detection algorithms, including zero-crossing ones.

Computing 1-bin of a DFT (or using the similar complex Goertzel block filter) will work if the signal frequency is accurately known. (Set the DFT bin or the Goertzel to exactly that frequency).
If the frequency isn't exactly known, you could try using an FFT with an FFTshift to interpolate the frequency magnitude peak, and then interpolate the phase at that frequency for each of the two signals. An FFT will also allow you to window the data, which may improve phase estimation accuracy if the frequency isn't exactly bin centered (or exactly the Goertzel filter frequency). Different windows may improve the phase estimation accuracy for frequencies "between bins". A Blackman-Nutall window will be better than a rectangular window, but there may be better window choices.
The phase measurement accuracy will depend on the S/N ratio, the length of time one samples the two (assumed stationary) signals, and possibly the window used.

If you have a Phase Locked Loop (PLL) that tracks each input, then you can subtract the phase coefficients (of the generator components) to determine offset between the phases. This would also be robust against noise.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart