fft numpy style on iOS accelerate with non power of two data length - ios

I'm working on reimplementing python code on iOS (swift).
I need to do an fft (numpy style) on chunks of 1D data. each with size 1050 (windowed audio data).
Thankfully I found related explanation and snippet of code on how to do iOS fft in numpy style (link).
However, I'm stuck where accelerate framework supports doing fft only on a power of 2 input data length (or more recently, f * 2^n, where f is 3, 5, or 15 and n is at least 3).
I tested my python code on window size 1050. Working great for my use case. But it is not straightforward to implement on iOS, because of the above limitation.
It is not so easy to dig into numpy c code to know how they're doing it for non power of two lengths. This answer was a good starting point for me, but still didn't get it.
Speed here is also important, that's why I'm not considering a brute force dft.
Any guidance here would be really appreciated.

IIRC, for fft, under-the-hood, numpy uses fftpack, a C conversion of an old NCAR Fortran math library. The actual numpy fft is not implemented in Python code. You could very likely compile some fftpack C code using Xcode, and use a bridging header to call it from iOS Swift code.

Your answers/comments guided me to use c/c++ code to get the desired result. (I didn't think of that as an option initially).
I ended up using opencv dft function (which internally implements fft) that produces similar results to numpy's fft (+ it is faster than numpy, according to their docs).

Related

Fastest way to compute cosine similarity in a GPU

So I have a huge tfidf matrix with more than a million records, I would like to find the cosine similarity of this matrix with itself. I am using colab to run the code, but I am not sure how to best make use of the gpu provided by colab.
sequentially run code -
tfidf_matrix = tf.fit_transform(df['categories'])
cosine_similarities = linear_kernel(matrix, matrix)
Is there way we can parallelise the code using jit or any other way?
try simple torch code like in this example from sentence transformers library: https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/util.py#L31
or just import the function.
consider cuml library which uses CUDA acceleration
https://docs.rapids.ai/api/cuml/nightly/api.html

The implementation in source code of Backpropagation in TensorFlow (Conv2DBackpropFilter and Conv2DBackpropInput)

Since two operations Conv2DBackpropFilter and Conv2DBackpropInput count most of the time for lots of applications(AlexNet/VGG/GAN/Inception, etc.), I am analyzing the complexity of these two operations (back-propagation) in TensorFlow and I found out that there are three implementation versions (custom, fast and slot) for Conv2DBackpropFilter (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/conv_grad_filter_ops.cc ) and Conv2DBackpropInput (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/conv_grad_input_ops.cc). While I profile, all computations are passed to "custom" version instead of "fast" or "slow" which directly calls Eigen function SpatialConvolutionBackwardInput to do that.
The issue is:
Conv2DBackpropFilter uses Eigen:“TensorMap.contract" to do the tensor contraction and Conv2DBackpropInput uses Eigen:"MatrixMap.transpose" to do the matrix transposition in the Compute() function. Beside these two functions, I didn't see any convolutional operations which are needed for back-propagation theoretically. Beside convolutions, what else would be run inside these two operations for back-propagation? Does anyone know how to analyze the computation complexity of "back propagation" operation in TensorFlow?
I am looking for any advise/suggestion. Thank you!
In addition to the transposition and contraction, the gradient op for the filter and the gradient op for the input must transform their input using Im2Col and Col2Im respectively. Approximately speaking, these transformations enable the convolution operation to be implemented using tensor contraction. For more information, see the CS231n page on Convolutional Networks (specifically, the paragraphs titled "Implementation as Matrix Multiplication" and "Backpropagation").
mrry, I got it. It means that Conv2D, Conv2DBackpropFilter and Conv2DBackpropInput use the same way by using "GEMM" to work for convolution by Im2Col/Col2Im. An other issue is that while I do the profile of GAN in TensorFlow, the execution time of Conv2DBackpropInput and Conv2DBackpropFilter are around 4-6 times slower than Conv2D with the same input size. Why?

deeplearing4j with SVHN dataset

I try to model a CNN with deeplearing4j using SVHN dataset (http://ufldl.stanford.edu/housenumbers/), in particular I'm using
Format 2: Cropped Digits
This is matlab's files and each one contains a struct with a tensor (4-D) and an array with label. I would open this one into my deeplearing4j code, so I wondered and I find this class MatlabRecordReader.java into deeplearning4j/DataVec (https://github.com/deeplearning4j/DataVec/blob/master/datavec-api/src/main/java/org/datavec/api/records/reader/impl/misc/MatlabRecordReader.java) but I can't understand how use it. Anybody has experience whit this?
Thanks in advance
Here is a reference for "datavec":
http://deeplearning4j.org/DataVec
So if you look at:
http://nd4j.org/tensor
All of deeplearning4j's neural nets are written using nd4j (matlab for java) so this should be pretty easy to map.
You'll see it more or less maps to matlab.
What might be easier is if you could just write out the values as a csv
and reshape them to be the proper value instead. If you use c ordering it should work fine.
If you do that you can just use the csvrecord reader.
That matlab record reader hasn't been used by a lot of people and I think may only work with matrices (it's been a while)
I would try the csv one first.

vDSP: Do the FFT functions include windowing?

I am working on implementing an algorithm using vDSP.
1) take FFT
2) take log of square of absolute value (can be done with lookup table)
3) take another FFT
4) take absolute value
I'm not sure if it is up to me to throw the incoming data through a windowing function before I run the FFT on it.
vDSP_fft_zrip(setupReal, &A, stride, log2n, direction);
that is my FFT function
Do I need to throw the data through vDSP_hamm_window(...) first?
The iOS Accelerate library function vDSP_fft_zrip() does not include applying a window function (unless you count the implied rectangular window due to the finite length parameter).
So you need to apply your chosen window function (there are many different ones) first.
It sounds like you're doing cepstral analysis and yes, you do need a window function prior to the first FFT. I would suggest a simple Hann or Hamming window.
I don't have any experience with your particular library, but in every other FFT library I know of it's up to you to window the data first. If nothing else, the library can't know what window you wish to use, and sometimes you don't want to use a window (if you're using the FFT for overlap-add filtering, or if you know the signal is exactly periodic in the transform block).
Also, just offhand, it seems like if you're doing 2 FFTs, the overhead of calling a logarithm function is relatively minor.

How to compute SVD using Cimg (or maybe openCV or eigen library)?

May anyone give me a quick guide on how to use Cimg to compute SVD for a 3-dimension array?
I just want to get the decomposition of the array in order to compress it small for speeding up further process.
What value should I input at where, and how to get the output?
I've been searched around and still can't understand how it works. and not really fully understand how SVD works as well..only know that it can used to decompress matrix.
At the same time I found that OpenCV and Eigen library also can done the job, do let me know their steps if is much more easier..
(Alternative for me instead of SVD is PCA, which I found its source/library but also don't know how to use..)
Thanks!
See http://cimg.sourceforge.net/reference/structcimg__library_1_1CImg.html#a9a79f3a0849388b3ec13bd140b67a12e
CImg<float> A(3,3); // A = U'*S*V
A.rand(0,1);
CImgList<float> USV = A.get_SVD(); //USV[0] = U and so forth

Resources