SceneKit Rigged Character Animation increase performance - ios

I have *.DAE files for characters each has 45-70 bones,
I want to have about 100 animated characters on the screen.
However when I have ~60 Characters the animations takes ~13ms of my update loop which is very costly, and leaves me almost no room for other tasks.
I am setting the animations "CAAnimationGroup" to the Mesh SCNNode
when I want to swap animations I am removing the previous animations with fadeOut set to 0.2 and adding the new Animation with FadeIn set to 0.2 as well. -> Is it bad ? Should I just pause a previous animation and play a new one ? or is it even worse?
Is there better ways to animate rigged characters in SceneKit maybe using GPU or something ?
Please get me started to the right direction to decrease the animations overhead in my update loop.
Update
After Contacting Apple via Bug radar I received this issue via E-Mail:
This issue is being worked on to be fixed in a future update, we will
let you know as soon as we have a beta build you can test and verify
this issue.
Thank you for your patience.
So lets wait and see how far Apple's Engineers will enhance it :).

SceneKit does the skeletal animation on the GPU if your vertices have less than 4 influences. From the docs, reproduced below:
SceneKit performs skeletal animation on the GPU only if the componentsPerVector count in this geometry source is 4 or less. Larger vectors result in CPU-based animation and drastically reduced rendering performance.
I have used the following code to detect if the animation is done on the GPU:
- (void)checkGPUSkinningForInScene:(SCNScene*)character
forNodes:(NSArray*)skinnedNodes {
for (NSString* nodeName in skinnedNodes) {
SCNNode* skinnedNode =
[character.rootNode childNodeWithName:nodeName recursively:YES];
SCNSkinner* skinner = skinnedNode.skinner;
NSLog(#"******** Skinner for node %# is %# with skeleton: %#",
skinnedNode.name, skinner, skinner.skeleton);
if (skinner) {
SCNGeometrySource* boneIndices = skinner.boneIndices;
SCNGeometrySource* boneWeights = skinner.boneWeights;
NSInteger influences = boneWeights.componentsPerVector;
if (influences <= 4) {
NSLog(#" This node %# with %lu influences is skinned on the GPU",
skinnedNode.name, influences);
} else {
NSLog(#" This node %# with %lu influences is skinned on the CPU",
skinnedNode.name, influences);
}
}
}
}
You pass the SCNScene and the names of nodes which have SCNSkinner attached to check if the animation is done on the GPU or the CPU.
However, there is one other hidden piece of information about animation on the GPU which is that if your skeleton has more than 60 bones, it won't be executed on the GPU. The trick to know that is to print the default vertex shader, by attaching an invalid shader modifier entry as explained in this post.
The vertex shader contains the following skinning related code:
#ifdef USE_SKINNING
uniform vec4 u_skinningJointMatrices[60];
....
#ifdef USE_SKINNING
{
vec3 pos = vec3(0.);
#ifdef USE_NORMAL
vec3 nrm = vec3(0.);
#endif
#if defined(USE_TANGENT) || defined(USE_BITANGENT)
vec3 tgt = vec3(0.);
#endif
for (int i = 0; i < MAX_BONE_INFLUENCES; ++i) {
#if MAX_BONE_INFLUENCES == 1
float weight = 1.0;
#else
float weight = a_skinningWeights[i];
#endif
int idx = int(a_skinningJoints[i]) * 3;
mat4 jointMatrix = mat4(u_skinningJointMatrices[idx], u_skinningJointMatrices[idx+1], u_skinningJointMatrices[idx+2], vec4(0., 0., 0., 1.));
pos += (_geometry.position * jointMatrix).xyz * weight;
#ifdef USE_NORMAL
nrm += _geometry.normal * mat3(jointMatrix) * weight;
#endif
#if defined(USE_TANGENT) || defined(USE_BITANGENT)
tgt += _geometry.tangent.xyz * mat3(jointMatrix) * weight;
#endif
}
_geometry.position.xyz = pos;
which clearly implies that your skeleton should be restricted to 60 bones.
If all your characters have the same skeleton, then I would suggest just check if the animation is executed on CPU or GPU using the above tips. Otherwise you may have to fix your character skeleton to have less than 60 bones and not more than 4 influences per vertex.

Related

Optimizing branching for lookup tables

Branching in WebGL seems to be something like the following (paraphrased from various articles):
The shader executes its code in parallel, and if it needs to evaluate whether a condition is true before continuing (e.g. with an if statement) then it must diverge and somehow communicate with the other threads in order to come to a conclusion.
Maybe that's a bit off - but ultimately, it seems like the problem with branching in shaders is when each thread may be seeing different data. Therefore, branching with uniforms-only is typically okay, whereas branching on dynamic data is not.
Question 1: Is this correct?
Question 2: How does this relate to something that's fairly predictable but not a uniform, such as an index in a loop?
Specifically, I have the following function:
vec4 getMorph(int morphIndex) {
/* doesn't work - can't access morphs via dynamic index
vec4 morphs[8];
morphs[0] = a_Morph_0;
morphs[1] = a_Morph_1;
...
morphs[7] = a_Morph_7;
return morphs[morphIndex];
*/
//need to do this:
if(morphIndex == 0) {
return a_Morph_0;
} else if(morphIndex == 1) {
return a_Morph_1;
}
...
else if(morphIndex == 7) {
return a_Morph_7;
}
}
And I call it in something like this:
for(int i = 0; i < 8; i++) {
pos += weight * getMorph(i);
normal += weight * getMorph(i);
...
}
Technically, it works fine - but my concern is all the if/else branches based on the dynamic index. Is that going to slow things down in a case like this?
For the sake of comparison, though it's tricky to explain in a few concise words here - I have an alternative idea to always run all the calculations for each attribute. This would involve potentially 24 superfluous vec4 += float * vec4 calculations per vertex. Would that be better or worse than branching 8 times on an index, usually?
note: in my actual code there's a few more levels of mapping and indirection, while it does boil down to the same getMorph(i) question, my use case involves getting that index from both an index in a loop, and a lookup of that index in a uniform integer array
I know this is not a direct answer to your question but ... why not just not use a loop?
vec3 pos = weight[0] * a_Morph_0 +
weight[1] * a_Morph_1 +
weight[2] * a_Morph_2 ...
If you want generic code (ie where you can set the number of morphs) then either get creative with #if, #else, #endif
const numMorphs = ?
const shaderSource = `
...
#define NUM_MORPHS ${numMorphs}
vec3 pos = weight[0] * a_Morph_0
#if NUM_MORPHS >= 1
+ weight[1] * a_Morph_1
#endif
#if NUM_MORPHS >= 2
+ weight[2] * a_Morph_2
#endif
;
...
`;
or generate the shader in JavaScript with string manipulation.
function createMorphShaderSource(numMorphs) {
const morphStrs = [];
for (i = 1; i < numMorphs; ++i) {
morphStrs.push(`+ weight[${i}] * a_Morph_${i}`);
}
return `
..shader code..
${morphStrs.join('\n')}
..shader code..
`;
}
Shader generation through string manipulation is a normal thing to do. You'll find all major 3d libraries do this (three.js, unreal, unity, pixi.js, playcanvas, etc...)
As for whether or not branching is slow it really depends on the GPU but the general rule is that yes, it's slower no matter how it's done.
You generally can avoid branches by writing custom shaders instead of trying to be generic.
Instead of
uniform bool haveTexture;
if (haveTexture) {
...
} else {
...
}
Just write 2 shaders. One with a texture and one without.
Another way to avoid branches is to get creative with your math. For example let's say we want to support vertex colors or textures
varying vec4 vertexColor;
uniform sampler2D textureColor;
...
vec4 tcolor = texture2D(textureColor, ...);
gl_FragColor = tcolor * vertexColor;
Now when we just want just a vertex color set textureColor to a 1x1 pixel white texture. When we just want just a texture turn off the attribute for vertexColor and set that attribute to white gl.vertexAttrib4f(vertexColorAttributeLocation, 1, 1, 1, 1); and bonus!, we can modulate the texture with vertexColors by supplying both a texture and vertex colors.
Similarly we could pass in a 0 or a 1 to multiply certain things by 0 or 1 to remove their influence. In your morph example, a 3d engine that is targeting performance would generate shaders for different numbers of morphs. A 3d engine that didn't care about performance would have 1 shader that supported N morph targets just set the weight to 0 for any unused targets to 0.
Yet another way to avoid branching is the step function which is defined as
step(edge, x) {
return x < edge ? 0.0 : 1.0;
}
So you can choose a or b with
v = mix(a, b, step(edge, x));

could NaN be causing the occasional crash in this core audio iOS app?

My first app synthesised music audio from a sine look-up table using methods deprecated since iOS 6. I have just revised it to address warnings about AudioSessionhelped by this blog and the Apple guidelines on AVFoundationFramework. Audio Session warnings have now been addressed and the app produces audio as it did before. It currently runs under iOS 9.
However the app occasionally crashes for no apparent reason. I checked out this SO post but it seems to deal with accessing rather than generating raw audio data, so maybe it is not dealing with a timing issue. I suspect there is a buffering problem but I need to understand what this might be before I change or fine tune anything in the code.
I have a deadline to make the revised app available to users so I'd be most grateful to hear from someone who has dealt a similar issue.
Here is the issue. The app goes into debug on the simulator reporting:
com.apple.coreaudio.AQClient (8):EXC_BAD_ACCESS (code=1, address=0xffffffff10626000)
In the Debug Navigator, Thread 8 (com.apple.coreaudio.AQClient (8)), it reports:
0 -[Synth fillBuffer:frames:]
1 -[PlayView audioBufferPlayer:fillBuffer:format:]
2 playCallback
This line of code in fillBuffer is highlighted
float sineValue = (1.0f - b)*sine[a] + b*sine[c];
... and so is this line of code in audioBufferPlayer
int packetsWritten = [synth fillBuffer:buffer->mAudioData frames:packetsPerBuffer];
... and playCallBack
[player.delegate audioBufferPlayer:player fillBuffer:inBuffer format:player.audioFormat];
Here is the code for audioBufferPlayer (delegate, essentially the same as in the demo referred to above).
- (void)audioBufferPlayer:(AudioBufferPlayer*)audioBufferPlayer fillBuffer:(AudioQueueBufferRef)buffer format:(AudioStreamBasicDescription)audioFormat
{
[synthLock lock];
int packetsPerBuffer = buffer->mAudioDataBytesCapacity / audioFormat.mBytesPerPacket;
int packetsWritten = [synth fillBuffer:buffer->mAudioData frames:packetsPerBuffer];
buffer->mAudioDataByteSize = packetsWritten * audioFormat.mBytesPerPacket;
[synthLock unlock];
}
... (initialised in myViewController)
- (id)init
{
if ((self = [super init])) {
// The audio buffer is managed (filled up etc.) within its own thread (Audio Queue thread)
// Since we are also responding to changes from the GUI, we need a lock so both threads
// do not attempt to change the same value independently.
synthLock = [[NSLock alloc] init];
// Synth and the AudioBufferPlayer must use the same sample rate.
float sampleRate = 44100.0f;
// Initialise synth to fill the audio buffer with audio samples.
synth = [[Synth alloc] initWithSampleRate:sampleRate];
// Initialise note buttons
buttons = [[NSMutableArray alloc] init];
// Initialise the audio buffer.
player = [[AudioBufferPlayer alloc] initWithSampleRate:sampleRate channels:1 bitsPerChannel:16 packetsPerBuffer:1024];
player.delegate = self;
player.gain = 0.9f;
[[AVAudioSession sharedInstance] setActive:YES error:nil];
}
return self;
} // initialisation
... and for playCallback
static void playCallback( void* inUserData, AudioQueueRef inAudioQueue, AudioQueueBufferRef inBuffer)
{
AudioBufferPlayer* player = (AudioBufferPlayer*) inUserData;
if (player.playing){
[player.delegate audioBufferPlayer:player fillBuffer:inBuffer format:player.audioFormat];
AudioQueueEnqueueBuffer(inAudioQueue, inBuffer, 0, NULL);
}
}
... and here is the code for fillBuffer where audio is synthesised
- (int)fillBuffer:(void*)buffer frames:(int)frames
{
SInt16* p = (SInt16*)buffer;
// Loop through the frames (or "block size"), then consider each sample for each tone.
for (int f = 0; f < frames; ++f)
{
float m = 0.0f; // the mixed value for this frame
for (int n = 0; n < MAX_TONE_EVENTS; ++n)
{
if (tones[n].state == STATE_INACTIVE) // only active tones
continue;
// recalculate a 30sec envelope and place in a look-up table
// Longer notes need to interpolate through the envelope
int a = (int)tones[n].envStep; // integer part (like a floored float)
float b = tones[n].envStep - a; // decimal part (like doing a modulo)
// c allows us to calculate if we need to wrap around
int c = a + 1; // (like a ceiling of integer part)
if (c >= envLength) c = a; // don't wrap around
/////////////// LOOK UP ENVELOPE TABLE /////////////////
// uses table look-up with interpolation for both level and pitch envelopes
// 'b' is a value interpolated between 2 successive samples 'a' and 'c')
// first, read values for the level envelope
float envValue = (1.0f - b)*tones[n].levelEnvelope[a] + b*tones[n].levelEnvelope[c];
// then the pitch envelope
float pitchFactorValue = (1.0f - b)*tones[n].pitchEnvelope[a] + b*tones[n].pitchEnvelope[c];
// Advance envelope pointer one step
tones[n].envStep += tones[n].envDelta;
// Turn note off at the end of the envelope.
if (((int)tones[n].envStep) >= envLength){
tones[n].state = STATE_INACTIVE;
continue;
}
// Precalculated Sine look-up table
a = (int)tones[n].phase; // integer part
b = tones[n].phase - a; // decimal part
c = a + 1;
if (c >= sineLength) c -= sineLength; // wrap around
///////////////// LOOK UP OF SINE TABLE ///////////////////
float sineValue = (1.0f - b)*sine[a] + b*sine[c];
// Wrap round when we get to the end of the sine look-up table.
tones[n].phase += (tones[n].frequency * pitchFactorValue); // calculate frequency for each point in the pitch envelope
if (((int)tones[n].phase) >= sineLength)
tones[n].phase -= sineLength;
////////////////// RAMP NOTE OFF IF IT HAS BEEN UNPRESSED
if (tones[n].state == STATE_UNPRESSED) {
tones[n].gain -= 0.0001;
if ( tones[n].gain <= 0 ) {
tones[n].state = STATE_INACTIVE;
}
}
//////////////// FINAL SAMPLE VALUE ///////////////////
float s = sineValue * envValue * gain * tones[n].gain;
// Clip the signal, if needed.
if (s > 1.0f) s = 1.0f;
else if (s < -1.0f) s = -1.0f;
// Add the sample to the out-going signal
m += s;
}
// Write the sample mix to the buffer as a 16-bit word.
p[f] = (SInt16)(m * 0x7FFF);
}
return frames;
}
I'm not sure whether it is a red herring but I came across NaN in several debug registers. It appears to happen while calculating phase increment for sine lookup in fillBuffer (see above). That calculation is done for up to a dozen partials every sample at a sampling rate of 44.1 kHz and worked in iOS 4 on an iPhone 4. I'm running on simulator of iOS 9. The only changes I made are described in this post!
My NaN problem turned out to have nothing directly to do with Core Audio. It was caused by an edge condition introduced by changes in another area of my code. The real problem was a division by zero attempted while calculating the duration of the sound envelope in realtime.
However, in trying to identify the cause of that problem, I am confident my pre-iOS 7 Audio Session has been replaced by a working setup based on AVFoundation. Thanks goes to the source of my initial code Matthijs Hollemans and also to Mario Diana whose blog explained the changes needed.
At first, the sound levels on my iPhone were significantly less than the sound levels on the Simulator, a problem addressed here by foundry. I found it necessary to include these improvements by replacing Mario's
- (BOOL)setUpAudioSession
with foundry's
- (void)configureAVAudioSession
Hopefully this might help someone else.

Animating rotation changes of UIImageView

I'm making an app that (among other things) displays a simplified compass image that rotates according to the device's rotation. The problem is that simply doing this:
float heading = -1.0f * M_PI * trueHeading / 180.0f; //trueHeading is always between 0 and 359, never 360
self.compassNeedle.transform = CGAffineTransformMakeRotation(heading);
inside CLLocationManager's didUpdateHeading method makes the animation ugly and choppy.
I have already used Instruments to find out whether its simply my app not being able to render at more than 30-48 fps, but that's not the case.
How can I smooth out the image view's rotation so that it's more like Apple's own Compass app?
Instead of using the current instant value, try using the average of the last N values for the true heading. The value may be jumping around a lot in a single instant but settle down "in the average".
Assuming you have a member variable storedReadings which is an NSMutableArray:
-(void)addReading(float):newReading
{
[storedReadings addObject:[NSNumber numberWithFloat:newReading]];
while([storedReadings count] > MAX_READINGS)
{
[storedReadings removeObjectAtIndex:0];
}
}
then when you need the average value (timer update?)
-(float)calcReading
{
float result = 0.0f;
if([storedReadings count] > 0)
{
foreach(NSNumber* reading in storedReadings)
{
result += [reading floatValue];
}
result /= [storedReadings count];
}
return result;
}
You get to pick MAX_READINGS a priori.
NEXT LEVEL(S) UP
If the readings are not jumping around so much but the animation is still choppy, you probably need to do something like a "smooth" rotation. At any given time, you have the current angle you are displaying, theta (store this in your class, start it out at 0). You also have your target angle, call it target. This is the value you get from the smoothed calcReading function. The error is defined as the difference between the two:
error = target-theta;
Set up a timer callback with a period of something like 0.05 seconds (20x per second). What you want to do is adjust theta so that the error is driven towards 0. You can do this in a couple of ways:
thetaNext += kProp * (target - theta); //This is proportional feedback.
thetaNext += kStep * sign(target-theta); // This moves theta a fixed amount each update. sign(x) = +1 if x >= 0 and -1 if x < 0.
The first solution will cause the rotation to change sharply the further it is from the target. It will also probably oscillate a little bit as it swings past the "zero" point. Bigger values of kProp will yield faster response but also more oscillation. Some tuning will be required.
The second solution will be much easier to control...it just "ticks" the compass needle around each time. You can set kStep to something like 1/4 degree, which gives you a "speed" of rotation of about (1/4 deg/update) * (20 updates/seconds) = 5 degrees per second. This is a bit slow, but you can see the math and change kStep to suit your needs. Note that you may to "band" the "error" value so that no action is taken if the error < kStep (or something like that). This prevents your compass from shifting when the angle is really close to the target. You can change kStep when the error is small so that it "slides" into the ending position (i.e. kStep is smaller when the error is small).
For dealing with Angle Issues (wrap around), I "normalize" the angle so it is always within -Pi/Pi. I don't guarantee this is the perfect way to do it, but it seems to get the job done:
// Takes an angle greater than +/- M_PI and converts it back
// to +/- M_PI. Useful in Box2D where angles continuously
// increase/decrease.
static inline float32 AdjustAngle(float32 angleRads)
{
if(angleRads > M_PI)
{
while(angleRads > M_PI)
{
angleRads -= 2*M_PI;
}
}
else if(angleRads < -M_PI)
{
while(angleRads < -M_PI)
{
angleRads += 2*M_PI;
}
}
return angleRads;
}
By doing it this way, -pi is the angle you reach from going in either direction as you continue to rotate left/right. That is to say, there is not a discontinuity in the number going from say 0 to 359 degrees.
SO PUTTING THIS ALL TOGETHER
static inline float Sign(float value)
{
if(value >= 0)
return 1.0f;
return -1.0f;
}
//#define ROTATION_OPTION_1
//#define ROTATION_OPTION_2
#define ROTATION_OPTION_3
-(void)updateArrow
{
// Calculate the angle to the player
CGPoint toPlayer = ccpSub(self.player.position,self.arrow.position);
// Calculate the angle of this...Note there are some inversions
// and the actual image is rotated 90 degrees so I had to offset it
// a bit.
float angleToPlayerRads = -atan2f(toPlayer.y, toPlayer.x);
angleToPlayerRads = AdjustAngle(angleToPlayerRads);
// This is the angle we "wish" the arrow would be pointing.
float targetAngle = CC_RADIANS_TO_DEGREES(angleToPlayerRads)+90;
float errorAngle = targetAngle-self.arrow.rotation;
CCLOG(#"Error Angle = %f",errorAngle);
#ifdef ROTATION_OPTION_1
// In this option, we just set the angle of the rotated sprite directly.
self.arrow.rotation = CC_RADIANS_TO_DEGREES(angleToPlayerRads)+90;
#endif
#ifdef ROTATION_OPTION_2
// In this option, we apply proportional feedback to the angle
// difference.
const float kProp = 0.05f;
self.arrow.rotation += kProp * (errorAngle);
#endif
#ifdef ROTATION_OPTION_3
// The step to take each update in degrees.
const float kStep = 4.0f;
// NOTE: Without the "if(fabs(...)) check, the angle
// can "dither" around the zero point when it is very close.
if(fabs(errorAngle) > kStep)
{
self.arrow.rotation += Sign(errorAngle)*kStep;
}
#endif
}
I put this code into a demo program I had written for Cocos2d. It shows a character (big box) being chased by some monsters (smaller boxes) and has an arrow in the center that always points towards the character. The updateArrow call is made on a timer tick (the update(dt) function) regularly. The player's position on the screen is set by the user tapping on the screen and the angle is based on the vector from the arrow to the player. In the function, I show all three options for setting the angle of the arrow:
Option 1
Just set it based on where the player is (i.e. just set it).
Option 2
Use proportional feedback to adjust the arrow's angle each time step.
Option 3
Step the angle of the arrow each timestep a little bit if the error angle is more than the step size.
Here is a picture showing roughly what it looks like:
And, all the code is available here on github. Just look in the HelloWorldLayer.m file.
Was this helpful?

How to get the texture size in HLSL?

For a HLSL shader I'm working on (for practice) I'm trying to execute a part of the code if the texture coordinates (on a model) are above half the respective size (that is x > width / 2 or y > height / 2). I'm familiar with C/C++ and know the basics of HLSL (the very basics). If no other solution is possible, I will set the texture size manually with XNA (in which I'm using the shader, as a matter of fact). Is there a better solution? I'm trying to remain within Shader Model 2.0 if possible.
The default texture coordinate space is normalized to 0..1 so x > width / 2 should simply be texcoord.x > 0.5.
Be careful here. tex2d() and other texture calls should NOT be within if()/else clauses. So if you have a pixel shader input "IN.UV" and your aiming at "OUT.color," you need to do it this way:
float4 aboveCol = tex2d(mySampler,some_texcoords);
float4 belowCol = tex2d(mySampler,some_other_texcoords);
if (UV.x >= 0.5) {
OUT.color = /* some function of... */ aboveCol;
} else {
OUT.color = /* some function of... */ belowCol;
}
rather than putting teh tex() calls inside the if() blocks.

Problem with HLSL looping/sampling

I have a piece of HLSL code which looks like this:
float4 GetIndirection(float2 TexCoord)
{
float4 indirection = tex2D(IndirectionSampler, TexCoord);
for (half mip = indirection.b * 255; mip > 1 && indirection.a < 128; mip--)
{
indirection = tex2Dlod(IndirectionSampler, float4(TexCoord, 0, mip));
}
return indirection;
}
The results I am getting are consistent with that loop only executing once. I checked the shader in PIX and things got even more weird, the yellow arrow indicating position in the code gets to the loop, goes through it once, and jumps back to the start, at that point the yellow arrow never moves again but the cursor moves through the code and returns a result (a bug in PIX, or am I just using it wrong?)
I have a suspicion this may be a problem to do with texture reads getting moved outside the loop by the compiler, however I thought that didn't happen with tex2Dlod since I'm setting the LOD manually :/
So:
1) What's the problem?
2) Any suggested solutions?
Problem was solved, it was a simple coding mistake, I needed to increase mip level on each iteration, not decrease it.
float4 GetIndirection(float2 TexCoord)
{
float4 indirection = tex2D(IndirectionSampler, TexCoord);
for (half mip = indirection.b * 255; mip > 1 && indirection.a < 128; mip++)
{
indirection = tex2Dlod(IndirectionSampler, float4(TexCoord, 0, mip));
}
return indirection;
}

Resources