Shaping input for a coremlmodel - ios

I have a coremlmodel taking an input of shape MultiArray (Float32 67 x 256 x 320)
I am having a hard time to shape the input to this model.
Currently, I am trying to achieve it like so,
var m = try! MLMultiArray(shape: [67,256,320], dataType: .double)
for i in 0...66{
var cost = rand((256,320)) // this is coming from swix [SWIX]
memcpy(m.dataPointer+i*256*320, &cur_cost.flat.grid , 256*320)
}
I will have to replace the rand with matrices of that size later. I am using this for testing purposes first.
Any pointers on how to mould input to fit the volume would be greatly appreciated..
[SWIX]

What seems to be wrong in your code is that you're copying bytes instead of doubles. A double is 8 bytes, so your offset should be i*256*320*MemoryLayout<Double>.stride and the amount you're copying should be 256*320*MemoryLayout<Double>.stride.
Note that you can also use the MLMultiArray's strides property to compute the offset for a given data element in the array:
let offset = i0 * strides[0].intValue + i1 * strides[1].intValue + i2 * strides[2].intValue

Related

How to export all the information from 3d numpy array to a csv file

Kaggle Dataset and code link
I'm trying to solve the above Kaggle problem and I want to export preprocessed csv so that I can build a model on weka, but when I'm trying to save it in csv I'm losing a dimension, I want to retain all the information in that csv.
please help me with the relevant code or any resource.
Thanks
print (scaled_x)
|x |y |z |label
|1.485231 |-0.661030 |-1.194153 |0
|0.888257 |-1.370361 |-0.829636 |0
|0.691523 |-0.594794 |-0.936247 |0
Fs=20
frame_size = Fs*4 #80
hop_size = Fs*2 #40
def get_frames(df, frame_size, hop_size):
N_FEATURES = 3
frames = []
labels = []
for i in range(0,len(df )- frame_size, hop_size):
x = df['x'].values[i: i+frame_size]
y = df['y'].values[i: i+frame_size]
z = df['z'].values[i: i+frame_size]
label = stats.mode(df['label'][i: i+frame_size])[0][0]
frames.append([x,y,z])
labels.append(label)
frames = np.asarray(frames).reshape(-1, frame_size, N_FEATURES)
labels = np.asarray(labels)
return frames, labels
x,y = get_frames(scaled_x, frame_size, hop_size)
x.shape, y.shape
((78728, 80, 3), (78728,))
According to the link you posted, the data is times series accelerometer/gyro data sampled at 20 Hz, with a label for each sample. They want to aggregate the time series into frames (with the corresponding label being the most common label during a given frame).
So frame_size is the number of samples in a frame, and hop_size is the amount the sliding window moves forward each iteration. In other words, the frames overlap by 50% since hop_size = frame_size / 2.
Thus at the end you get a 3D array of 78728 frames of length 80, with 3 values (x, y, z) each.
EDIT: To answer your new question about how to export as CSV, you'll need to "flatten" the 3D frame array to a 2D array since that's what a CSV represents. There are multiple different ways to do this but I think the easiest may just be to concatenate the final two dimensions, so that each row is a frame, consisting of 240 values (80 samples of 3 co-ordinates each). Then concatenate the labels as the final column.
x_2d = np.reshape(x, (x.shape[0], -1))
full = np.concatenate([x, y], axis=1)
import pandas as pd
df = pd.DataFrame(full)
df.to_csv("frames.csv")
If you also want proper column names:
columns = []
for i in range(1, x.shape[1] + 1):
columns.extend([f"{i}_X", f"{i}_Y", f"{i}_Z"])
columns.append("label")
df = pd.DataFrame(full, columns=columns)

Which sigmoid function to use for increasing contrast of an image?

Is there a standard sigmoidal function used to increase the contrast in a gray level bitmap?
Currently I am using the following. This would be applied to gray levels represented at values between 0 and 1 inclusive.
static double ContrastCurve(double val, double k = 1)
{
Func<double,double> logistic_func = (double x) => 1.0 / (1.0 + Math.Exp(-k * (x - 0.5)));
var low = logistic_func(0);
var high = logistic_func(1);
var range = high - low;
var value = logistic_func(val);
return (value - low) / range;
}
This is the logistic function applied to a value between 0 and 1 with the output normalized so that the output is also in [0...1]. This function works but it is completely arbitrary, something I just made up, so the k param has no official name or meaning in image processing literature and so forth.
If there is a function that is standard I would prefer that but haven't found anything that seems definitive. Code such as this link seems as ad hoc to me.
As Mark Setchell's comment notes, ImageMagick uses the following function citing "Fundamentals of Image Processing", Hany Farid:
g(u) = 1 / [1 + exp(-α*u + β)]
scaled such that for domain [0..1] its range is [0..1].
This is essentially a two parameter version of the function defined in the code in the question above i.e. the code in the question implements the same function but makes the substitution α = k and β = -k/2 which yields a one parameter function f where f(0.5) = 0.5 when scaled such that f(0) = 0 and f(1) = 1.

Need a vectorized solution in pytorch

I'm doing an experiment using face images in PyTorch framework. The input x is the given face image of size 5 * 5 (height * width) and there are 192 channels.
Objective: To obtain patches of x of patch_size(given as argument).
I have obtained the required result with the help of two for loops. But I want a better-vectorized solution so that the computation cost will be very less than using two for loops.
Used: PyTorch 0.4.1, (12 GB) Nvidia TitanX GPU.
The following is my implementation using two for loops
def extractpatches( x, patch_size): # x is bsx192x5x5
patches = x.unfold( 2, patch_size , 1).unfold(3,patch_size,1)
bs,c,pi,pj, _, _ = patches.size() #bs,192,
cnt = 0
p = torch.empty((bs,pi*pj,c,patch_size,patch_size)).to(device)
s = torch.empty((bs,pi*pj, c*patch_size*patch_size)).to(device)
//Want a vectorized method instead of two for loops below
for i in range(pi):
for j in range(pj):
p[:,cnt,:,:,:] = patches[:,:,i,j,:,:]
s[:,cnt,:] = p[:,cnt,:,:,:].view(-1,c*patch_size*patch_size)
cnt = cnt+1
return s
Thanks for your help in advance.
I think you can try this as following. I used some parts of your code for my experiment and it worked for me. Here l and f are the lists of tensor patches
l = [patches[:,:,int(i/pi),i%pi,:,:] for i in range(pi * pi)]
f = [l[i].contiguous().view(-1,c*patch_size*patch_size) for i in range(pi * pi)]
You can verify the above code using toy input values.
Thanks.

Linear Interpolation - shrinking a line

Suppose we have a 1D array named that consists of 9 elements:
Source[0 to 8].
Using "Linear Interpolation" we want to shrink it into a smaller 4 point array: Destination [0 to 3].
This is how I understand the Algorithm:
Calculate the ratio between both array lengths: 9/4 = 2.5
Iterate over the destination coordinates and find the appropriate source coordinate:
Destination [0] = 0 * 2.5 = Source [0] -> Success! use this exact value.
Destination [1] = 1 * 2.5 = Source [2.5] -> No such element! Calculate the average of Source[2] and Source[3].
Destination [2] = 2 * 2.5 = Source [5] -> Success! use this exact value.
Destination [2] = 3 * 2.5 = Source [7.5] -> No such element! Calculate the average of Source[7] and Source[8].
Is this correct ?
Almost correct. 9/4 = 2.25. ;-)
Anyway, if you want to preserve the endpoint values, you should calculate the ratio as (9-1)/(4-1) = 2.666... (Between points 0, 1, 2, 3 there are only three segments, thus the length equals to 3. The same refers to 0...8).
If you don't hit the exact value, remember to compute a weigheted mean, e.g.
Destination[1] = 1 * 2.667 -> (3-2.667)*Source[2] + (2.667-2)*Source[3]
This is from the equation,
y = y0(x1-x) + y1(x-x0)
where, in this case,
x=2.66
x0=2
x1=3
y0=Source[2]
y1=Source[3]

F#/"Accelerator v2" DFT algorithm implementation probably incorrect

I'm trying to experiment with software defined radio concepts. From this article I've tried to implement a GPU-parallelism Discrete Fourier Transform.
I'm pretty sure I could pre-calculate 90 degrees of the sin(i) cos(i) and then just flip and repeat rather than what I'm doing in this code and that that would speed it up. But so far, I don't even think I'm getting correct answers. An all-zeros input gives a 0 result as I'd expect, but all 0.5 as inputs gives 78.9985886f (I'd expect a 0 result in this case too). Basically, I'm just generally confused. I don't have any good input data and I don't know what to do with the result or how to verify it.
This question is related to my other post here
open Microsoft.ParallelArrays
open System
// X64MulticoreTarget is faster on my machine, unexpectedly
let target = new DX9Target() // new X64MulticoreTarget()
ignore(target.ToArray1D(new FloatParallelArray([| 0.0f |]))) // Dummy operation to warm up the GPU
let stopwatch = new System.Diagnostics.Stopwatch() // For benchmarking
let Hz = 50.0f
let fStep = (2.0f * float32(Math.PI)) / Hz
let shift = 0.0f // offset, once we have to adjust for the last batch of samples of a stream
// If I knew that the periodic function is periodic
// at whole-number intervals, I think I could keep
// shift within a smaller range to support streams
// without overflowing shift - but I haven't
// figured that out
//let elements = 8192 // maximum for a 1D array - makes sense as 2^13
//let elements = 7240 // maximum on my machine for a 2D array, but why?
let elements = 7240
// need good data!!
let buffer : float32[,] = Array2D.init<float32> elements elements (fun i j -> 0.5f) //(float32(i * elements) + float32(j)))
let input = new FloatParallelArray(buffer)
let seqN : float32[,] = Array2D.init<float32> elements elements (fun i j -> (float32(i * elements) + float32(j)))
let steps = new FloatParallelArray(seqN)
let shiftedSteps = ParallelArrays.Add(shift, steps)
let increments = ParallelArrays.Multiply(fStep, steps)
let cos_i = ParallelArrays.Cos(increments) // Real component series
let sin_i = ParallelArrays.Sin(increments) // Imaginary component series
stopwatch.Start()
// From the documentation, I think ParallelArrays.Multiply does standard element by
// element multiplication, not matrix multiplication
// Then we sum each element for each complex component (I don't understand the relationship
// of this, or the importance of the generalization to complex numbers)
let real = target.ToArray1D(ParallelArrays.Sum(ParallelArrays.Multiply(input, cos_i))).[0]
let imag = target.ToArray1D(ParallelArrays.Sum(ParallelArrays.Multiply(input, sin_i))).[0]
printf "%A in " ((real * real) + (imag * imag)) // sum the squares for the presence of the frequency
stopwatch.Stop()
printfn "%A" stopwatch.ElapsedMilliseconds
ignore (System.Console.ReadKey())
I share your surprise that your answer is not closer to zero. I'd suggest writing naive code to perform your DFT in F# and seeing if you can track down the source of the discrepancy.
Here's what I think you're trying to do:
let N = 7240
let F = 1.0f/50.0f
let pi = single System.Math.PI
let signal = [| for i in 1 .. N*N -> 0.5f |]
let real =
seq { for i in 0 .. N*N-1 -> signal.[i] * (cos (2.0f * pi * F * (single i))) }
|> Seq.sum
let img =
seq { for i in 0 .. N*N-1 -> signal.[i] * (sin (2.0f * pi * F * (single i))) }
|> Seq.sum
let power = real*real + img*img
Hopefully you can use this naive code to get a better intuition for how the accelerator code ought to behave, which could guide you in your testing of the accelerator code. Keep in mind that part of the reason for the discrepancy may simply be the precision of the calculations - there are ~52 million elements in your arrays, so accumulating a total error of 79 may not actually be too bad. FWIW, I get a power of ~0.05 when running the above single precision code, but a power of ~4e-18 when using equivalent code with double precision numbers.
Two suggestions:
ensure you're not somehow confusing degrees with radians
try doing it sans-parallelism, or just with F#'s asyncs for parallelism
(In F#, if you have an array of floats
let a : float[] = ...
then you can 'add a step to all of them in parallel' to produce a new array with
let aShift = a |> (fun x -> async { return x + shift })
|> Async.Parallel |> Async.RunSynchronously
(though I expect this might be slower that just doing a synchronous loop).)

Resources