I would like to iterate over a vector several times:
let my_vector = vec![1, 2, 3, 4, 5];
let mut out_vector = vec![];
for i in my_vector {
for j in my_vector {
out_vector.push(i * j + i + j);
}
}
The j-loop has a "value used here after move" error. I know that I can place an & before the two my_vectors and borrow the vectors, but it is nice to have more than one way to do things. I would like a little insight as well.
Alternatively, I can write the following:
let i_vec = vec![1, 2, 3, 4, 5, 6];
let iterator = i_vec.iter();
let mut out_vec = vec![];
for i in iterator.clone() {
for j in iterator.clone() {
out_vec.push(i * j + i + j);
}
}
I looked at What's the most efficient way to reuse an iterator in Rust?:
Iterators in general are Clone-able if all their "pieces" are Clone-able.
Is the actual heap allocated data a "piece" of the iterator or is it the memory address that points to the heap data the aforementioned piece?
Cloning a slice iterator (this is the type of iterator you get when calling iter() on a Vec or an array) does not copy the underlying data. Both iterators still point to data stored in the original vector, so the clone operation is cheap.
The same is likely true for clonable iterators on other types.
In your case, instead of calling i_vec.iter() and then cloning it, you can also call i_vec.iter() multiple times:
for i in i_vec.iter() {
for j in i_vec.iter() {
which gives the same result but is probably more readable.
Related
Is there a fast (native) method to search for a sequence in a Uint8List?
///
/// Return index of first occurrence of seq in list
///
int indexOfSeq(Uint8List list, Uint8List seq) {
...
}
EDIT: Changed List<int> into Uint8List
No. There is no built-in way to search for a sequence of elements in a list.
I am also not aware of any dart:ffi based implementations.
The simplest approach would be:
extension IndexOfElements<T> on List<T> {
int indexOfElements(List<T> elements, [int start = 0]) {
if (elements.isEmpty) return start;
var end = length - elements.length;
if (start > end) return -1;
var first = elements.first;
var pos = start;
while (true) {
pos = indexOf(first, pos);
if (pos < 0 || pos > end) return -1;
for (var i = 1; i < elements.length; i++) {
if (this[pos + i] != elements[i]) {
pos++;
continue;
}
}
return pos;
}
}
}
This has worst-case time complexity O(length*elements.length). There are several more algorithms with better worst-case complexity, but they also have larger constant factors and more expensive pre-computations (KMP, BMH). Unless you search for the same long list several times, or do so in a very, very long list, they're unlikely to be faster in practice (and they'd probably have an API where you compile the pattern first, then search with it.)
You could use dart:ffi to bind to memmem from string.h as you suggested.
We do the same with binding to malloc from stdlib.h in package:ffi (source).
final DynamicLibrary stdlib = Platform.isWindows
? DynamicLibrary.open('kernel32.dll')
: DynamicLibrary.process();
final PosixMalloc posixMalloc =
stdlib.lookupFunction<Pointer Function(IntPtr), Pointer Function(int)>('malloc');
Edit: as lrn pointed out, we cannot expose the inner data pointer of a Uint8List at the moment, because the GC might relocate it.
One could use dart_api.h and use the FFI to pass TypedData through the FFI trampoline as Dart_Handle and use Dart_TypedDataAcquireData from the dart_api.h to access the inner data pointer.
(If you want to use this in Flutter, we would need to expose Dart_TypedDataAcquireData and Dart_TypedDataReleaseData in dart_api_dl.h https://github.com/dart-lang/sdk/issues/40607 I've filed https://github.com/dart-lang/sdk/issues/44442 to track this.)
Alternatively, could address https://github.com/dart-lang/sdk/issues/36707 so that we could just expose the inner data pointer of a Uint8List directly in the FFI trampoline.
I am working with a big shared memory matrix of 1.3e6x1.3e6 in a foreach loop. I create that matrix with FBM function of bigstatsr package.
I need the results of the loop in the FBM class object to not run out of RAM memory.
This is what I want to do without FBM class object.
library(doParallel)
library(foreach)
library("doFuture")
cl=makeCluster(2)
registerDoParallel(cl
)
registerDoFuture()
plan(multicore)
results=foreach(a=1:4,.combine='cbind') %dopar% {
a=a-1
foreach(b=1:2,.combine='c') %dopar% {
return(10*a + b)
}
}
And this is how I try it
library(bigstatsr)
results=FBM(4,4,init=0)
saveinFBM=function(x,j){results[,j]=x}
foreach(a=1:4,.combine='savinFBM') %dopar% {
a=a-1
foreach(b=1:2,.combine='c') %dopar% {
return(10*a + b)
}
}
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'savinFBM' of mode 'function' was not found
PS: Could anybody add the tag "dofuture"?
If I understand correctly what you want to do, a faster alternative is using outer(1:2, 1:4, function(b, a) 10 * (a - 1) + b).
If you want to fill an FBM with this function, you can do:
library(bigstatsr)
X <- FBM(200, 400)
big_apply(X, a.FUN = function(X, ind) {
X[, ind] <- outer(rows_along(X), ind, function(b, a) 10 * (a - 1) + b)
NULL
})
Usually, using parallelism won't help when you write data on disk (what you do when you fill X[, ind]), but it you really want to try, you can use ncores = nb_cores() as additional argument of big_apply().
Imagine a gfor with a seq j...
If I need to use the value of the instance j as a index, who can I do that?
something like:
vector<double> a(n);
gfor(seq j, n){
//Do some calculation and save this on someValue
a[j] = someValue;
}
Someone can help me (again) ?
Thanks.
I've found a solution for this...
if someone had a better option, feel free to post...
First, create a seq with the same size of your gfor instances.
Then, convert that seq in a array.
Now, take the value of that line on array (it's equals the index)
seq sequencia(0, 200);
af::array sqc = sequencia;
//Inside the gfor loop
countLoop = (int) sqc(j).scalar<float>();
Your approach works, but breaks gfors parallelization as converting the index to a scalar forces it to be written from the gpu back to the host, slamming the breaks on the GPU.
You want to do it more like this :
af::array a(200);
gfor(seq j, 200){
//Do some calculation and save this on someValue
a[j] = af::array(someValue); // for someValue a primitive type, say float
}
// ... Now we're safe outside the parallel loop, let's grab the array results
float results[200];
a.host(results) // Copy array from GPU to host, populating a c-type array
The Metal Performance Shader framework provides support for building your own Convolutional Neural Nets. When creating for instance an MSPCNNConvolution it requires a 4D weight tensor as init parameter that is represented as a 1D float pointer.
init(device: MTLDevice,
convolutionDescriptor: MPSCNNConvolutionDescriptor,
kernelWeights: UnsafePointer<Float>,
biasTerms: UnsafePointer<Float>?,
flags: MPSCNNConvolutionFlags)
The documentation has this to say about the 4D tensor
The layout of the filter weight is arranged so that it can be
reinterpreted as a 4D tensor (array)
weight[outputChannels][kernelHeight][kernelWidth][inputChannels/groups]
Unfortunately that information doesn't really tell me how to arrange a 4D array into a one dimensional Float pointer.
I tried ordering the weights like the BNNS counterpart requires it, but without luck.
How do I properly represent the 4D tensor (array) as a 1D Float pointer (array)?
PS: I tried arranging it like a C array and getting the pointer to the flat array, but it didn't work.
UPDATE
#RhythmicFistman: That's how I stored it in a plain array, which I can convert to a UsafePointer<Float> (but doesn't work):
var output = Array<Float>(repeating: 0, count: weights.count)
for o in 0..<outputChannels {
for ky in 0..<kernelHeight {
for kx in 0..<kernelWidth {
for i in 0..<inputChannels {
let offset = ((o * kernelHeight + ky) * kernelWidth + kx) * inputChannels + i
output[offset] = ...
}
}
}
}
Ok so I figured it out. Here are the 2 python functions I use to reform my convolutions and fully connected matrices
# shape required for MPSCNN [oC kH kW iC]
# tensorflow order is [kH kW iC oC]
def convshape(a):
a = np.swapaxes(a, 2, 3)
a = np.swapaxes(a, 1, 2)
a = np.swapaxes(a, 0, 1)
return a
# fully connected only requires a x/y swap
def fullshape(a):
a = np.swapaxes(a, 0, 1)
return a
This is something I recently had to do for Caffe weights, so I can provide the Swift implementation for how I reordered those. The following function takes in a Float array of Caffe weights for a convolution (in [c_o][c_i][h][w] order) and reorders those to what Metal expects ([c_o][h][w][c_i] order):
public func convertCaffeWeightsToMPS(_ weights:[Float], kernelSize:(width:Int, height:Int), inputChannels:Int, outputChannels:Int, groups:Int) -> [Float] {
var weightArray:[Float] = Array(repeating:0.0, count:weights.count)
var outputIndex = 0
let groupedInputChannels = inputChannels / groups
let outputChannelWidth = groupedInputChannels * kernelSize.width * kernelSize.height
// MPS ordering: [c_o][h][w][c_i]
for outputChannel in 0..<outputChannels {
for heightInKernel in 0..<kernelSize.height {
for widthInKernel in 0..<kernelSize.width {
for inputChannel in 0..<groupedInputChannels {
// Caffe ordering: [c_o][c_i][h][w]
let calculatedIndex = outputChannel * outputChannelWidth + inputChannel * kernelSize.width * kernelSize.height + heightInKernel * kernelSize.width + widthInKernel
weightArray[outputIndex] = weights[calculatedIndex]
outputIndex += 1
}
}
}
}
return weightArray
}
Based on my layer visualization, this seems to generate the correct convolution results (matching those produced by Caffe). I believe it also properly takes grouping into account, but I need to verify that.
Tensorflow has a different ordering than Caffe, but you should be able to change the math in the inner part of the loop to account for that.
The documentation here assumes some expertise in C. In that context, a[x][y][z] is typically collapsed into a 1-d array when x, y and z are constants known at compile time. When this happens, the z component varies most quickly, followed by y, followed by x -- outside in.
If we have a[2][2][2], it is collapsed to 1D as:
{ a[0][0][0], a[0][0][1], a[0][1][0], a[0][1][1],
a[1][0][0], a[1][0][1], a[1][1][0], a[1][1][1] }
I think tensorflow already has a convenient method for such task:
tf.transpose(aWeightTensor, perm=[3, 0, 1, 2])
Full documentation: https://www.tensorflow.org/api_docs/python/tf/transpose
I have implemented fft into at32ucb series ucontroller using kiss fft library and currently struggling with the output of the fft.
My intention is to analyse sound coming from piezo speaker.
Currently, the frequency of the sounder is 420Hz which I successfully got from the fft output (cross checked with an oscilloscope). However, the output frequency is just half of expected if I put function generator waveform into the system.
I suspect its the frequency bin calculation formula which I got wrong; currently using, fft_peak_magnitude_index*sampling frequency / fft_size.
My input is real and doing real fft. (output samples = N/2)
And also doing iir filtering and windowing before fft.
Any suggestion would be a great help!
// IIR filter calculation, n = 256 fft points
for (ctr=0; ctr<n; ctr++)
{
// filter calculation
y[ctr] = num_coef[0]*x[ctr];
y[ctr] += (num_coef[1]*x[ctr-1]) - (den_coef[1]*y[ctr-1]);
y[ctr] += (num_coef[2]*x[ctr-2]) - (den_coef[2]*y[ctr-2]);
y1[ctr] = y[ctr] - 510; //eliminate dc offset
// hamming window
hamming[ctr] = (0.54-((0.46) * cos(2*M_PI*ctr/n)));
window[ctr] = hamming[ctr]*y1[ctr];
fft_input[ctr].r = window[ctr];
fft_input[ctr].i = 0;
fft_output[ctr].r = 0;
fft_output[ctr].i = 0;
}
kiss_fftr_cfg fftConfig = kiss_fftr_alloc(n,0,NULL,NULL);
kiss_fftr(fftConfig, (kiss_fft_scalar * )fft_input, fft_output);
peak = 0;
freq_bin = 0;
for (ctr=0; ctr<n1; ctr++)
{
fft_mag[ctr] = 10*(sqrt((fft_output[ctr].r * fft_output[ctr].r) + (fft_output[ctr].i * fft_output[ctr].i)))/(0.5*n);
if(fft_mag[ctr] > peak)
{
peak = fft_mag[ctr];
freq_bin = ctr;
}
frequency = (freq_bin*(10989/n)); // 10989 is the sampling freq
//************************************
//Usart write
char filtResult[10];
//sprintf(filtResult, "%04d %04d %04d\n", (int)peak, (int)freq_bin, (int)frequency);
sprintf(filtResult, "%04d %04d %04d\n", (int)x[ctr], (int)fft_mag[ctr], (int)frequency);
char c;
char *ptr = &filtResult[0];
do
{
c = *ptr;
ptr++;
usart_bw_write_char(&AVR32_USART2, (int)c);
// sendByte(c);
} while (c != '\n');
}
The main problem is likely to be how you declared fft_input.
Based on your previous question, you are allocating fft_input as an array of kiss_fft_cpx. The function kiss_fftr on the other hand expect an array of scalar. By casting the input array into a kiss_fft_scalar with:
kiss_fftr(fftConfig, (kiss_fft_scalar * )fft_input, fft_output);
KissFFT essentially sees an array of real-valued data which contains zeros every second sample (what you filled in as imaginary parts). This is effectively an upsampled version (although without interpolation) of your original signal, i.e. a signal with effectively twice the sampling rate (which is not accounted for in your freq_bin to frequency conversion). To fix this, I suggest you pack your data into a kiss_fft_scalar array:
kiss_fft_scalar fft_input[n];
...
for (ctr=0; ctr<n; ctr++)
{
...
fft_input[ctr] = window[ctr];
...
}
kiss_fftr_cfg fftConfig = kiss_fftr_alloc(n,0,NULL,NULL);
kiss_fftr(fftConfig, fft_input, fft_output);
Note also that while looking for the peak magnitude, you probably are only interested in the final largest peak, instead of the running maximum. As such, you could limit the loop to only computing the peak (using freq_bin instead of ctr as an array index in the following sprintf statements if needed):
for (ctr=0; ctr<n1; ctr++)
{
fft_mag[ctr] = 10*(sqrt((fft_output[ctr].r * fft_output[ctr].r) + (fft_output[ctr].i * fft_output[ctr].i)))/(0.5*n);
if(fft_mag[ctr] > peak)
{
peak = fft_mag[ctr];
freq_bin = ctr;
}
} // close the loop here before computing "frequency"
Finally, when computing the frequency associated with the bin with the largest magnitude, you need the ensure the computation is done using floating point arithmetic. If as I suspect n is an integer, your formula would be performing the 10989/n factor using integer arithmetic resulting in truncation. This can be simply remedied with:
frequency = (freq_bin*(10989.0/n)); // 10989 is the sampling freq