What does the "simd" prefix mean in SceneKit? - ios

There is a SCNNode category named SCNNode(SIMD), which declares some properties like simdPosition, simdRotation and so on. It seems these are duplicated properties of the original/normal properties position and rotation.
#property(nonatomic) simd_float3 simdPosition API_AVAILABLE(macos(10.13), ios(11.0), tvos(11.0), watchos(4.0));
#property(nonatomic) simd_float4 simdRotation API_AVAILABLE(macos(10.13), ios(11.0), tvos(11.0), watchos(4.0));
What's the difference between position and simdPosition? What does the prefix "simd" mean exactly?

SIMD: Single Instruction Multiple Data
SIMD instructions allow you to perform the same operation on multiple values at the same time.
Let's see an example
Serial Approach (NO SIMD)
We have these 4 Int32 values
let x0: Int32 = 10
let y0: Int32 = 20
let x1: Int32 = 30
let y1: Int32 = 40
Now we want to sum the 2 x and the 2 y values, so we write
let sumX = x0 + x1 // 40
let sumY = y0 + y1 // 60
In order to perform the 2 previous sums the CPU needs to
load x0 and x1 in memory and add them
load y0 and y1 in memory and add them
So the result is obtained with 2 operations.
I created some graphics to better show you the idea
Step 1
Step 2
SIMD
Let's see now how SIMD does work.
First of all we need the input values stored in the proper SIMD format so
let x = simd_int2(10, 20)
let y = simd_int2(30, 40)
As you can see the previous x and y are vectors. Infact both x and y contain 2 components.
Now we can write
let sum = x + y
Let's see what the CPU does in order to perform the previous operations
load x and y in memory and add them
That's it!
Both components of x and both components of y are processed at the same time.
Parallel Programming
We are NOT talking about concurrent programming, instead this is real parallel programming.
As you can imagine in certain operation the SIMD approach is way faster then the serial one.
Scene Kit
Let's see now an example in SceneKit
We want to add 10 to the x, y and z components of all the direct descendants of the scene node.
Using the classic serial approach we can write
for node in scene.rootNode.childNodes {
node.position.x += 10
node.position.y += 10
node.position.z += 10
}
Here a total of childNodes.count * 3 operations is executed.
Let's see now how we can convert the previous code in SIMD instructions
let delta = simd_float3(10)
for node in scene.rootNode.childNodes {
node.simdPosition += delta
}
This code is much faster then the previous one. I am not sure whether 2x or 3x faster but, believe me, it's way better.
Wrap up
If you need to perform several times the same operation on different value, just use the SIMD properties :)

SIMD is a small library built on top of vector types that you can import from <simd/simd.h>. It allows for more expressive and more performant code.
For instance using SIMD you can write
simd_float3 result = a + 2.0 * b;
instead of
SCNVector3 result = SCNVector3Make(a.x + 2.0 * b.x, a.y + 2.0 * b.y, a.z + 2.0 * b.z);
In Objective-C you can not overload methods. That is you can not have both
#property(nonatomic) SCNVector3 position;
#property(nonatomic) simd_float3 position API_AVAILABLE(macos(10.13), ios(11.0), tvos(11.0), watchos(4.0));
The new SIMD-based API needed a different name, and that's why SceneKit exposes simdPosition.

Related

Interpolating a triangle

I am currently developing a grid for a simple simulation and I have been tasked with interpolating some values tied to vertices of a triangle.
So far I have this:
let val1 = 10f
let val2 = 15f
let val3 = 12f
let point1 = Vector2(100f, 300f), val1
let point2 = Vector2(300f, 102f), val2
let point3 = Vector2(100f, 100f), val3
let points = [point1; point2; point3]
let find (points : (Vector2*float32) list) (pos : Vector2) =
let (minX, minXv) = points |> List.minBy (fun (v, valu) -> v.X)
let (maxX, maxXv) = points |> List.maxBy (fun (v, valu)-> v.X)
let (minY, minYv) = points |> List.minBy (fun (v, valu) -> v.Y)
let (maxY, maxYv) = points |> List.maxBy (fun (v, valu) -> v.Y)
let xy = (pos - minX)/(maxX - minX)*(maxX - minX)
let dx = ((maxXv - minXv)/(maxX.X - minX.X))
let dy = ((maxYv - minYv)/(maxY.Y - minY.Y))
((dx*xy.X + dy*xy.Y)) + minXv
Where you get a list of points forming a triangle. I find the minimum X and Y and the max X and Y with the corresponding values tied to them.
The problem is this approach only works with a right sided triangle. With an equilateral triangle the mid point will end up having a higher value at its vertex than the value that is set.
So I guess the approach is here to essentially project a right sided triangle and create some sort of transformation matrix between any triangle and this projected triangle?
Is this correct? If not, then any pointers would be most appreciated!
You probably want a linear interpolation where the interpolated value is the result of a function of the form
f(x, y) = a*x + b*y + c
If you consider this in 3d, with (x,y) a position on the ground and f(x,y) the height above it, this formula will give you a plane.
To obtain the parameters you can use the points you have:
f(x1, y1) = x1*a + y1*b * 1*c = v1 ⎛x1 y1 1⎞ ⎛a⎞ ⎛v1⎞
f(x2, y2) = x2*a + y2*b * 1*c = v2 ⎜x2 y2 1⎟ * ⎜b⎟ = ⎜v2⎟
f(x3, y3) = x3*a + y3*b * 1*c = v3 ⎝x3 y3 1⎠ ⎝c⎠ ⎝v3⎠
This is a 3×3 system of linear equations: three equations in three unknowns.
You can solve this in a number of ways, e.g. using Gaussian elimination, the inverse matrix, Cramer's rule or some linear algebra library. A numerics expert may tell you that there are differences in the numeric stability between these approaches, particularly if the corners of the triangle are close to lying on a single line. But as long as you're sufficiently far away from that degenerate situation, it probably doesn't make a huge practical difference for simple use cases. Note that if you want to interpolate values for multiple positions relative to a single triangle, you'd only compute a,b,c once and then just use the simple linear formula for each input position, which might lead to a considerable speed-up.
Advanced info: For some applications, linear interpolation is not good enough, but to find something more appropriate you would need to provide more data than your question suggests is available. One example that comes to my mind is triangle meshes for 3d rendering. If you use linear interpolation to map the triangles to texture coordinates, then they will line up along the edges but the direction of the mapping can change abruptly, leading to noticeable seams. A kind of projective interpolation or weighted interpolation can avoid this, as I learned from a paper on conformal equivalence of triangle meshes (Springborn, Schröder, Pinkall, 2008), but for that you need to know how the triangle in world coordinates maps to the triangle in texture coordinates, and your also need the triangle mesh and the correspondence to the texture to be compatible with this mapping. Then you'd map in such a way that you not only transport corners to corners, but also circumcircle to circumcircle.

Vector Matrix multiplication via ARM NEON

I have a task - to multiply big row vector (10 000 elements) via big column-major matrix (10 000 rows, 400 columns). I decided to go with ARM NEON since I'm curious about this technology and would like to learn more about it.
Here's a working example of vector matrix multiplication I wrote:
//float* vec_ptr - a pointer to vector
//float* mat_ptr - a pointer to matrix
//float* out_ptr - a pointer to output vector
//int matCols - matrix columns
//int vecRows - vector rows, the same as matrix
for (int i = 0, max_i = matCols; i < max_i; i++) {
for (int j = 0, max_j = vecRows - 3; j < max_j; j+=4, mat_ptr+=4, vec_ptr+=4) {
float32x4_t mat_val = vld1q_f32(mat_ptr); //get 4 elements from matrix
float32x4_t vec_val = vld1q_f32(vec_ptr); //get 4 elements from vector
float32x4_t out_val = vmulq_f32(mat_val, vec_val); //multiply vectors
float32_t total_sum = vaddvq_f32(out_val); //sum elements of vector together
out_ptr[i] += total_sum;
}
vec_ptr = &myVec[0]; //switch ptr back again to zero element
}
The problem is that it's taking very long time to compute - 30 ms on iPhone 7+ when my goal is 1 ms or even less if it's possible. Current execution time is understandable since I launch multiplication iteration 400 * (10000 / 4) = 1 000 000 times.
Also, I tried to process 8 elements instead of 4. It seems to help, but numbers still very far from my goal.
I understand that I might make some horrible mistakes since I'm newbie with ARM NEON. And I would be happy if someone can give me some tip how I can optimize my code.
Also - is it worth doing big vector-matrix multiplication via ARM NEON? Does this technology fit well for such purpose?
Your code is completely flawed: it iterates 16 times assuming both matCols and vecRows are 4. What's the point of SIMD then?
And the major performance problem lies in float32_t total_sum = vaddvq_f32(out_val);:
You should never convert a vector to a scalar inside a loop since it causes a pipeline hazard that costs around 15 cycles everytime.
The solution:
float32x4x4_t myMat;
float32x2_t myVecLow, myVecHigh;
myVecLow = vld1_f32(&pVec[0]);
myVecHigh = vld1_f32(&pVec[2]);
myMat = vld4q_f32(pMat);
myMat.val[0] = vmulq_lane_f32(myMat.val[0], myVecLow, 0);
myMat.val[0] = vmlaq_lane_f32(myMat.val[0], myMat.val[1], myVecLow, 1);
myMat.val[0] = vmlaq_lane_f32(myMat.val[0], myMat.val[2], myVecHigh, 0);
myMat.val[0] = vmlaq_lane_f32(myMat.val[0], myMat.val[3], myVecHigh, 1);
vst1q_f32(pDst, myMat.val[0]);
Compute all the four rows in a single pass
Do a matrix transpose (rotation) on-the-fly by vld4
Do vector-scalar multiply-accumulate instead of vector-vector multiply and horizontal add that causes the pipeline hazards.
You were asking if SIMD is suitable for matrix operations? A simple "yes" would be a monumental understatement. You don't even need a loop for this.

Adding ease in and out to moving platforms with spritekit for 2d platformer

Well Hello,
I'm making a 2d platformer for iOS using spritekit. I have moving platforms to allow my characters to move with the platform.
I can't just use skactions to move my platforms because the character will not move with the platform.
question:
How would I add an ease in and out function in order to have the platforms??? simulate: SKactionTimeMode.easeInEaseOut
Current Solution:
I don't have the code in front of me, but for a left/right moving platform this is pretty much what I'm doing. This would be running within the platforms update() method.
If platform.position.x < xPositionIWantNodeToStopGoingLeft {
velAmount = -velAmount
}
else if platform.position.x > xPositionIWantNodeToStopGoingRight {
velAmount = -velAmount
}
platform.physicsBody?.velocity = SKVector(dx: velAmount, dy: velAmount
platform.position.y = staticYPosition
Just to clarify, this works great. If there is a better way to do this I'm all ears. But this creates a jagged stop and turn kind of feel. I want that ease in and out feel so that the platform feels more natural.
Thanks for any help!!!
Ease in out function
If we consider the time for the platform to move from one side to the other as one unit ( it might be 10 seconds, or 17 frames, it does not matter, we work in units for now).
We do the same with the distance. The platform must move one unit distance in one unit of time.
For this answer time is t and the position is a function of time written as f(t) is the platform position at time t.
For simple linear movement then the function is simply f(t)=t. So at time t=0 the distance moved is 0, at time 0.5 (half way) the distance is 0.5 (half way), and so on.
So lets put that into something a little more practical.
Please excuse my swift I have never used it befor (I am sure you can correct any syntax I get wrong).
// First normalise the distance and time (make them one unit long)
// get the distance
let distance = Double(xPositionStopGoingLeft - xPositionStopGoingRight);
// use that and the velocity to get the time to travel
let timeToTravel = distance / Double(velAmountX);
// first we have a frame ticker
gameTick += 1; // that ticks for every frame
// We can assume that the platform is always moving back and forth
// Now is the unit time where at now = 2 the platform has move there and back
// at 3 it has move across again and at 4 back again.
let now = Double(gameTick) / timeToTravel; // normalize time.
// get the remainder of 2 as from 0-1 is moving forward and 1-2 is back
let phase = now % 2.0;
// We also need the unit time for the function f(t)=t
let t = abs(phase - 1);
if phase >= 1 { t = 1 - t } // reverse for return
// implement the function f(t) = t where f(t) is dist
let dist = t
// and convert back to pixel distance
platform.position.x = Int(dist * distance + Double(xPositionStopGoingLeft));
So that is the linear platform. To make the movement change all we need to do is change the function f(t)=?, in the above its the line let dist = t
For a ease in out there is a handy function that is used in most ease applications f(t) = t * t / ((t * t) + (1 - t) * ( 1 - t))
There are some t*t which are powers, t to the power of 2 or t^2 . In swift its pow(t,2) so rewriting the above as code
let dist = pow(t,2) / (pow(t,2) + pow((1-t),2);
This gives a nice ease at the start and end As the distance and time traveled is constant the speed at the middle point t = 0.5 must be greater to catch up with the slow start and end. (Side note, Get the derivative of the above function lets you workout the speed at every point in time f'(t) = speed(t) = 2(-(t-1)t)^(2-1) /(t^2+(1-t)^2)^2)
This function is so nice, the speed at time 0.5 is 2, the same as the power (for the linear journey it would be 1). A handy property of the function is that the speed at the mid way point is always the same as the power. If you want it to move really fast at the midpoint say 4 times as fast then you use the power of 4
let dist = pow(t,4) / (pow(t,4) + pow((1-t),4);
If you want it only to speed up a little say 1.2 times the speed at the center then the power is 1.2
let dist = pow(t,1.2) / (pow(t,1.2) + pow((1-t),1.2);
So now we can introduce another term, maxSpeed which is the normalised maxSpeed (Side note more precisely it is the speed at t=0.5 as it can be the slower than 1, but for our need max speed will do)
let maxSpeed = Double(velAmountX + 3) / Double(velAmountX); // 3 pixels per frame faster
and the function f(t) = t^m / (t^m + (1-t)^m) where m is maxSpeed.
and as code
let dist = pow(t,maxSpeed ) / (pow(t,maxSpeed ) + pow((1-t),maxSpeed);
So put that all together
// the next 3 lines can be constats
let distance = Double(xPositionStopGoingLeft - xPositionStopGoingRight);
let timeToTravel = distance / Double(velAmountX);
let maxSpeed = Double(velAmountX + 3) / Double(velAmountX);
gameTick += 1; // that ticks for every frame
let now = Double(gameTick) / timeToTravel; // normalize time.
let phase = now % 2.0;
let t = abs(phase - 1);
if phase >= 1 { t = 1 - t } // reverse for return
// the next line is the ease function
let dist = pow(t, maxSpeed) / (pow(t, maxSpeed) + pow((1-t) ,maxSpeed);
// position the platform
platform.position.x = Int(dist * distance + Double(xPositionStopGoingLeft));
Now you can at any tick calculate the position of the platform. If you want to slow the whole game down and step frames at half ticks it still will work. if you speed the game up gameTick += 2 it still works.
Also the max speed can be lower than the linear speed. If you want the platform to be half the normal speed at the center t=0.5 the set maxSpeed = 0.5 and at the halfway point the speed will be half. To keep everything working the ease at the start and end will be quicker a rush in and rush out. (and works for reverse as well)
To help maybe a visual representation
Image shows the movement of the platform back and forth over time. The distance is about 60 pixels and the time can be 1 minute. So at 1 min it will be one the right 2min on the left, and so on.
Then we normalise the movement and time by looking only at one section of movement.
The graph represents the movement from left to right side, the distance is 1, and the time is 1. It has just been scaled to fit the unit box (1 by 1 box).
The red line represent the linear movement f(t)=t (constant speed). At any point of time you move across hit the line move down and you can find the distance traveled.
The green line represents the ease function f(t)=t*t/(t*t+(1-t)*(1-t)) and it works the same. At any point of time scan across to find the green line and move down to get the distance. the function f(t) does that for you.
With the maxSpeed the steepness of the line at dist 0.5 is changed, with steeper slope representing faster travel.
For physic, play with friction and linear damping of the body. You can even use an SKAction run block to reduce or add friction.
you could do something like:
physicsBody.friction = (10 - physicsBody.velocity.dx) > 0 ? (10 - physicsBody.velocity.dx) / 10 : 0
Basically it gives friction when velocity.dx is < 10, you may want to tweak the 10 to the number of your liking

Alternative to CMPedometer to calculate number of steps with accelerometer on iOS

Since CMPedometer is not available for below iPhone5S.
CMPedometer StepCounting not Available
Is there an algorithm code that we can use to program number of steps with the accelerometer on ios ?
Thanks
IOS aside, there is no simple solution to create an accurate pedometer using just the accelerometer output; it's just to noisy. Using the output from a gyroscope(where available) to filter the output would increase the accuracy.
But a crude here's a crude approach to wiring code for a pedometer:
- steps are detected as a variation in the acceleration detected on the Z axis. Assuming you know the default acceleration(the impact of gravity) here's how you do it:
float g = (x * x + y * y + z * z) / (GRAVITY_VALUE * GRAVITY_VALUE)
Your threshold is g=1 (this is what you would see when standing still). Spikes in this value represent steps. So all you have to do is count the spikes. Please mind here that a simple g>1 will not do, as for one step, the g value will increase for a certain amount of time and then go back (if you plot the value over time, it should look like a sin wave when there is a step - essentially you want to count the sin waves)
Mind you that this is just something to get you started; you will have to add more complexity to it to increase accuracy.
Things like:
- hysteresis to avoid false step detection
- filtering the accelerometer output
- figuring out the step intervals
Are not included here and should be experimented with.
You can detect step Event using accelerometer data from CMMotionManager
protected CMMotionManager _motionManager;
public event EventHandler<bool> OnMotion;
public double ACCEL_DETECTION_LIMIT = 0.31;
private const double ACCEL_REDUCE_SPEED = 0.9;
private double accel = -1;
private double accelCurrent = 0;
private void StartAccelerometerUpdates()
{
if (_motionManager.AccelerometerAvailable)
_motionManager.AccelerometerUpdateInterval = ACCEL_UPDATE_INTERVAL;
_motionManager.StartAccelerometerUpdates (NSOperationQueue.MainQueue, AccelerometerDataUpdatedHandler);
}
public void AccelerometerDataUpdatedHandler(CMAccelerometerData data, NSError error)
{
double x = data.Acceleration.X;
double y = data.Acceleration.Y;
double z = data.Acceleration.Z;
double accelLast = accelCurrent;
accelCurrent = Math.Sqrt(x * x + y * y + z * z);
double delta = accelCurrent - accelLast;
accel = accel * ACCEL_REDUCE_SPEED + delta;
var didStep = OnMotion;
if (accel > ACCEL_DETECTION_LIMIT)
{
didStep (this, true);//maked a step
} else {
didStep (this, false);
}
}

Circle estimation from 2D data set

I am doing some computer vision based hand gesture recognising stuff. Here, I want to detect a circle (a circular motion) made by my hand. My initial stages are working fine and I am able to get a blob whose centroid from each frame I am plotting. This is essentially my data set. A collection of 2D co-ordinate points. Now I want to detect a circular type motion and say generate a call to a function which says "Circle Detected". The circle detector will give a YES / NO boolean output.
Here is a sample of the data set I am generating in 40 frames
The x, y values are just plotted to a bitmap image using MATLAB.
My initial hand movement was slow and later I picked up speed to complete the circle within stipulated time (40 frames). There is no hard and fast rule about the number of frames thing but for now I am using a 40 frame sliding window for circle detection (0-39) then (1-40) then (2-41) etc.
I am also calculating the arc-tangent between successive points using:
angle = atan2(prev_y - y, prev_x - x) * 180 / pi;
Now what approach should I take for detecting a circle (This sample image should result in a YES). The angle as I am noticing is not steadily increasing from 0 to 360. It does increase but with jumps here and there.
If you are only interested in full or nearly full circles:
I think that the standard parameter estimation approach: Hough/RANSAC won't work very well in this case.
Since you have frames order and therefore distances between consecutive blob centers, you can create a nearly uniform sub sample of the data (let say, pick 20 points spaced ~evenly), calculate the center and measure the distance of all points from that center.
If it is nearly a circle all points will have similar distance from the center.
If you want to do something slightly more robust, you can:
Compute center (mean) of all points.
Perform gradient descent to update the center: should be fairly easy an you won't have local minima. The error term I would probably use is max(D) - min(D) where D is the vector of distances between the blob centers and estimated circle center (but you can use robust statistics instead of max & min)
Evaluate the circle
I would use a Least Square estimation. Numerically you can use the Nelder-Mead method. You get the circle that best approximate your points and on the basis of the residual error value you decide whether to consider the circle valid or not.
Being points the array of the points, xc, yc the coordinates of the center and r the radius, this could be an example of error to minimize:
class Circle
{
private PointF[] _points;
public Circle(PointF[] points)
{
_points = points;
}
public double MinimizeFunction(double xc, double yc, double r)
{
double d, d2, dx, dy, sum;
sum = 0;
foreach(PointF p in _points)
{
dx = p.X - xc;
dy = p.Y - yc;
d2 = dx * dx + dy * dy;
// sum += d2 - r * r;
d = Math.Sqrt(d2) - r;
sum += d * d;
}
return sum;
}
public double ResidualError(double xc, double yc, double r)
{
return Math.Sqrt(MinimizeFunctional(xc, yc, r)) / (_points.Length - 3);
}
}
There is a slight difference between the commented functional and the uncommented, but for practical reason this difference is meaningless. Instead, from a theoretical point of view the difference is important.
Since you need to supply a initial values set (xc, yc, r), you can calculate the circle given three points, choosing three points far from each other.
If you need more details on "circle given three points" or Nelder-Mead you can google or ask me here.

Resources