Game Engine Collison Bitmask... Why 0x01 etc? - ios

Coming across this situation both in Sprite Kit (iOS Development) and in Cocos2d-x (which I know was pretty much the inspiration for Sprite Kit, hence why they use a lot of the same tools), I finally decided to figure out why this happens:
When using a physic engine, I create a sprite, and add a physicsBody to it. For the most part, I understand how to set the category, collision, and contact bitmasks, and how they work. The problem is the actual bitmask number:
SpriteKit:
static const uint32_t missileCategory = 0x1 << 0;
sprite.physicsBody.categoryBitMask = missileCategory;
Cocos2D-X:
sprite->getPhysicsBody()->setCategoryBitmask(0x01); // 0001
I'm totally confused as to why I would write 0x01 or 0x1 << 0 in either case. I somewhat get that they're using hex, and it has something to do with 32-bit integers. And as far as I've been able to google, 0x01 is 0001 in binary which is 1 in decimal. And 0x02 is 0010 in binary which is 2 in decimal. Okay, so there are these conversions, but why in the world would I use them for something simple like categories?
As far as my logic goes, if I have lets say a player category, an enemy category, a missile category, and a wall category, that's just 4 categories. Why not use strings for the category? Or even just binary numbers that any non-CS person would understand like 0,1,2, and 3?
And finally, I'm confused why there are 32 different categories available? I thought a 32-bit integer had numbers 0-some billion number (unsigned of course). So why do I not have billions of different possible categories?
Is there some sort of optimization I am not understanding? Or this is just an old convention they use, but is not needed? Or is there something going on that someone with only 2 semesters of college course CS training wouldn't understand?

The reason for the bitmasks is that it enables you / the program to easily and very quickly compute wether a collision between two objects occurs or does not occur. Therefore: yes it is some sort of optimization.
Assuming we have the three categories
missile 0x1 << 0
player 0x1 << 1
wall 0x1 << 2
Now we have a Player instance, its category is set to player. Its collision bitmask is set to missile | player | wall (+ instead of | works too) since we want to be able to collide with all three types: other players, the level walls and the bullets / missiles flying around.
Now we have a Missile with category set to missile and collision bitmask set to player | wall: it does not collide with other missiles but hits players and walls.
If we now want to evaluate wether two objects can collide with each other we take the category bitmask of the first one and the collision bitmask of the second one and simply & them:
The setup described above looks like the following in code:
let player : UInt8 = 0b1 << 0 // 00000001 = 1
let missile : UInt8 = 0b1 << 1 // 00000010 = 2
let wall : UInt8 = 0b1 << 2 // 00000100 = 4
let playerCollision = player | missile | wall // 00000111 = 7
let missileCollision = player | wall // 00000101 = 5
The subsequent reasoning is basically:
if player & missileCollision != 0 {
print("potential collision between player and missile") // prints
}
if missile & missileCollision != 0 {
print("potential collision between two missiles") // does not print
}
We are using some bit arithmetics here, each bit represents a category.
You could simply enumerate the bitmasks 1,2,3,4,5... but then you could not do any math on them. Because you do not know if a 5 as category bitmask is really a category 5 or it was an object of both categories 1 and 4.
However using only bits we can do just that: the only representation in terms of powers of 2 of a 7 is 4 + 2 + 1: therefore whatever object posses collision bitmask 7 collides with category 4, 2 and 1. And the one with bitmask 5 is exactly and only a combination of category 1 and 4 - there is no other way.
Now since we are not enumerating - each category uses one bit and the regular integer has only 32 (or 64) bits we can only have 32 (or 64) categories.
Take a look at the following and a bit more extensive code which demonstrates how the masks are used in a more general term:
let playerCategory : UInt8 = 0b1 << 0
let missileCategory : UInt8 = 0b1 << 1
let wallCategory : UInt8 = 0b1 << 2
struct EntityStruct {
var categoryBitmask : UInt8
var collisionBitmask : UInt8
}
let player = EntityStruct(categoryBitmask: playerCategory, collisionBitmask: playerCategory | missileCategory | wallCategory)
let missileOne = EntityStruct(categoryBitmask: missileCategory, collisionBitmask: playerCategory | wallCategory)
let missileTwo = EntityStruct(categoryBitmask: missileCategory, collisionBitmask: playerCategory | wallCategory)
let wall = EntityStruct(categoryBitmask: wallCategory, collisionBitmask: playerCategory | missileCategory | wallCategory)
func canTwoObjectsCollide(first:EntityStruct, _ second:EntityStruct) -> Bool {
if first.categoryBitmask & second.collisionBitmask != 0 {
return true
}
return false
}
canTwoObjectsCollide(player, missileOne) // true
canTwoObjectsCollide(player, wall) // true
canTwoObjectsCollide(wall, missileOne) // true
canTwoObjectsCollide(missileTwo, missileOne) // false
The important part here is that the method canTwoObjectsCollide does not care about the type of the objects or how many categories there are. As long as you stick with bitmasks that is all you need to determine wether or not two objects can theoretically collide (ignoring their positions, which is a task for another day).

luk2302's answer is great, but just to go a bit further and in other directions...
Why hex notation? (0x1 << 2 etc)
Once you know that bit positions are the important part, it is (as mentioned in comments) just a matter of style/readability. You could just as well do:
let catA = 0b0001
let catB = 0b0010
let catC = 0b0100
But binary literals like that are (as far as Apple tools are concerned) new to Swift and not available in ObjC.
You could also do:
static const uint32_t catA = 1 << 0;
static const uint32_t catB = 1 << 1;
static const uint32_t catC = 1 << 2;
or:
static const uint32_t catA = 1;
static const uint32_t catB = 2;
static const uint32_t catC = 4;
But, for historical/cultural reasons, it's become common convention among programmers to use hexadecimal notation as a way of reminding oneself/other readers of your code that a particular integer literal is significant more for its bit pattern than its absolute value. (Also, for the second C example you have to remember which bit has which place value, whereas with << operator or binary literals you can emphasize the position.)
Why bit patterns? Why not ___?
Using bit patterns / bit masks is a performance optimization. To check for collisions, a physics engine must examine every pair of objects in the world. Because it's pairwise, the performance cost is quadratic: if you have 4 objects, you have 4*4 = 16 possible collisions to check... 5 objects is 5*5 = 25 possible conditions, etc. You can cut that list down with some obvious exclusions (no worries about an object colliding with itself, A collides with B is the same as B collides with A, etc), but the growth is still proportional to a quadratic; that is, for n objects, you have O(n2) possible collisions to check. (And remember, we're counting total objects in the scene, not categories.)
Many interesting physics games have a lot more than 5 total objects in the scene, and run at 30 or 60 frames per second (or at least want to). That means the physics engine has to check all those possible collision pairs in 16 milliseconds. Or preferably, much less than 16 ms, because it still has other physics-y stuff to do before/after finding collisions, and the game engine needs time to render, and you probably want time for your game logic in there, too.
Bit mask comparisons are very fast. Something like the mask comparison:
if (bodyA.categoryBitMask & bodyB.collisionBitMask != 0)
...is one of the quickest things you can ask an ALU to do — like one or two clock cycles fast. (Anyone know where to track down actual cycles per instruction figures?)
By contrast, string comparison is an algorithm in itself, requiring a lot more time. (Not to mention some easy way to have those strings express the combinations of categories that should result in collisions.)
A challenge
Since bit masks are a performance optimization, they might as well be a (private) implementation detail. But most physics engines, including SpriteKit's, leave them as part of the API. It'd be nicer to have a way of saying "these are my categories, these are how they should interact" at a high level, and let someone else handle the details of translating that description into bit masks. Apple's DemoBots sample code project appears to have one idea for simplifying such things (see ColliderType in the source)... feel free to use it design your own.

To answer your specific question
"why there are 32 different categories available? I thought a 32-bit
integer had numbers 0-some billion number (unsigned of course). So why
do I not have billions of different possible categories?"
The answer is that the category is always treated as a 32-digit bit mask of which ONLY ONE bit should be set. So these are the valid values:
00000000000000000000000000000001 = 1 = 1 << 0
00000000000000000000000000000010 = 2 = 1 << 1
00000000000000000000000000000100 = 4 = 1 << 2
00000000000000000000000000001000 = 8 = 1 << 3
00000000000000000000000000010000 = 16 = 1 << 4
00000000000000000000000000100000 = 32 = 1 << 5
00000000000000000000000001000000 = 64 = 1 << 6
00000000000000000000000010000000 = 128 = 1 << 7
00000000000000000000000100000000 = 256 = 1 << 8
00000000000000000000001000000000 = 512 = 1 << 9
00000000000000000000010000000000 = 1024 = 1 << 10
00000000000000000000100000000000 = 2048 = 1 << 11
.
.
.
10000000000000000000000000000000 = 2,147,483,648 = 1 << 31
So there are 32 diffeeent categories available. Your categoryBitMask however can have multiple bits sets so can indeed be any number from 1 to whatever the maximum of UInt32 is. for example, in an arcade game you might have categories such as:
00000000000000000000000000000001 = 1 = 1 << 0 //Human
00000000000000000000000000000010 = 2 = 1 << 1 //Alien
00000000000000000000000000000100 = 4 = 1 << 2 //Soldier
00000000000000000000000000001000 = 8 = 1 << 3 //Officer
00000000000000000000000000010000 = 16 = 1 << 4 //Bullet
00000000000000000000000000100000 = 32 = 1 << 5 //laser
00000000000000000000000001000000 = 64 = 1 << 6 //powershot
so a human civilian might have a categoryBitMask of 1, a human soldier 5 (1 + 4), an alien officer 6, a normal bullet 16, a missile 80 (16 + 64), mega-death ray 96 etc etc.

Related

How to use segmentation model output tensor?

I'm trying to run the segmentation model on iOS and I have several questions about how I should properly use the output tensor.
Here the link on the model I'm using:
https://www.tensorflow.org/lite/models/segmentation/overview
When I run this model I'm getting the output tensor with dimension:
1 x 257 x 257 x 21.
Why I get 21 as the last dimension? It looks like for each pixel we are getting the class scores. Do we need to find argmax here to get the correct class value?
But why only 21 classes? I was thinking it should contain more. And where I can find the info which value corresponds to a certain class.
In ImageClassification example we have a label.txt with 1001 classes.
Based on ImageClassification example I did an attempt to parse the tensor: firstly transform it to Float array of size 1 387 029 (21 x 257 x 257) and then using the following code I'm creating an image pixel by pixel:
// size = 257
// depth = 21
// array - float array of size 1 387 029
for i in 0..<size {
for j in 0..<size {
var scores: [Float] = []
for k in 0..<depth {
let index = i * size * depth + j * depth + k
let score = array[index]
scores.append(score)
}
if let maxScore = scores.max(),
let maxClass = scores.firstIndex(of: maxScore) {
let index = i * size + j
if maxClass == 0 {
pixelBuffer[index] = .blue
} else if maxClass == 12 {
pixelBuffer[index] = .black
} else {
pixelBuffer[index] = .green
}
}
}
}
Here the result I get:
You can see that quality is not really good. What have I missed?
The segmentation model for CoreML(https://developer.apple.com/machine-learning/models/) works much better on the same example:
It seems like your model was trained on PASCAL VOC data that has 21 classes for segmentation.
You can find a list of the classes here:
background
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
Adding to the answer by Shai, you can also use a tool like Netron to visualize your network and get more insight on the inputs and outputs, for example your input would be the image of size 257x257x3:
And you already know your output size, for segmentation models you are getting that 21 since that is the number of classes your model supports as Shai mentioned, then take the argmax of each pixel for all classes and that should give you a more decent output, no need to resize anything, try something like (in pseudo-code):
output = [rows][cols]
for i in rows:
for j in cols:
argmax = -1
for c in classes:
if tensor_out[i][j][c] > argmax:
argmax = tensor_out[i][j][c]
output[i][j] = c
Then output would be your segmented image.

Random values with different weights

Here's a question about entity framework that has been bugging me for a bit.
I have a table called prizes that has different prizes. Some with higher and some with lower monetary values. A simple representation of it would be as such:
+----+---------+--------+
| id | name | weight |
+----+---------+--------+
| 1 | Prize 1 | 80 |
| 2 | Prize 2 | 15 |
| 3 | Prize 3 | 5 |
+----+---------+--------+
Weight is this case is the likely hood I would like this item to be randomly selected.
I select one random prize at a time like so:
var prize = db.Prizes.OrderBy(r => Guid.NewGuid()).Take(1).First();
What I would like to do is use the weight to determine the likelihood of a random item being returned, so Prize 1 would return 80% of the time, Prize 2 15% and so on.
I thought that one way of doing that would be by having the prize on the database as many times as the weight. That way having 80 times Prize 1 would have a higher likelihood of being returned when compared to Prize 3, but this is not necessarily exact.
There has to be a better way of doing this, so i was wondering if you could help me out with this.
Thanks in advance
Normally I would not do this in database, but rather use code to solve the problem.
In your case, I would generate a random number within 1 to 100. If the number generated is between 1 to 80 then 1st one wins, if it's between 81 to 95 then 2nd one wins, and if between 96 to 100 the last one win.
Because the random number could be any number from 1 to 100, each number has 1% of chance to be hit, then you can manage the winning chance by giving the range of what the random number falls into.
Hope this helps.
Henry
This can be done by creating bins for the three (generally, n) items and then choose a selected random to be dropped in one of those bins.
There might be a statistical library that could do this for you i.e. proportionately select a bin from n bins.
A solution that does not limit you to three prizes/weights could be implemented like below:
//All Prizes in the Database
var allRows = db.Prizes.ToList();
//All Weight Values
var weights = db.Prizes.Select(p => new { p.Weight });
//Sum of Weights
var weightsSum = weights.AsEnumerable().Sum(w => w.Weight);
//Construct Bins e.g. [0, 80, 95, 100]
//Three Bins are: (0-80],(80-95],(95-100]
int[] bins = new int[weights.Count() + 1];
int start = 0;
bins[start] = 0;
foreach (var weight in weights) {
start++;
bins[start] = bins[start - 1] + weight.Weight;
}
//Generate a random number between 1 and weightsSum (inclusive)
Random rnd = new Random();
int selection = rnd.Next(1, weightsSum + 1);
//Assign random number to the bin
int chosenBin = 0;
for (chosenBin = 0; chosenBin < bins.Length; chosenBin++)
{
if (bins[chosenBin] < selection && selection <= bins[chosenBin + 1])
{
break;
}
}
//Announce the Prize
Console.WriteLine("You have won: " + allRows.ElementAt(chosenBin));

for loops in objective C

Apologies for a basic question. I have checking out the for loops here and here and for example if we analyse the first code :
for(int i = 0; i < CFDataGetLength(pixelData); i += 4) {
pixelBytes[i] // red
pixelBytes[i+1] // green
pixelBytes[i+2] // blue
pixelBytes[i+3] // alpha
}
The variable i is being incremented from 0 to the length of the array pixelData, in steps of 4.
However how does pixelBytes[i+3] access the alpha channel of the image? So for example if i=5, how does pixelBytes[5+3] equal the alpha channel instead of just accessing the 8th element of pixelBytes?
If i starts at zero and is incremented by 4 each time, how can it ever equal 5?
Presumably, the structure is stored with each channel occupying one byte, first red, then green, then blue, then alpha, then red again and so on. The for loop mimics this structure by increment i by four each time, so if the first time through pixelBytes[i+1] is the first green value, the second time through it will be four bytes later and thus the second green value.
Sometimes it helps to unrool the loop on a sheet of paper
// First pixel
RGBA
^ Index 0 = i(0) + 0
^ Index 1 = i(0) + 1
^ Index 2 = i(0) + 2
^ Index 3 = i(0) + 3
i + 4
// Second pixel
RGBA RGBA
^ Index 4 = i(4) + 0
^ Index 5 = i(4) + 1
^ Index 6 = i(4) + 2
^ Index 7 = i(4) + 3
i + 4
// Third pixel
RGBA RGBA RGBA
^ Index 8 = i(8) + 0
^ Index 9 = i(8) + 1
^ Index 10 = i(8) + 2
^ Index 11 = i(8) + 3
You have colours stored in the RGBA format. In the RGBA format, one colour is stored in 4 bytes, the first byte being the value for red (R), second is green (G), third is blue (B), and last is alpha (A).
Your own code explains this pretty well in its comments:
pixelBytes[i] // red
pixelBytes[i+1] // green
pixelBytes[i+2] // blue
pixelBytes[i+3] // alpha
It is important to note though, that if i is not a multiple of 4, you're not going to be reading the colours correctly anymore.
While the code isn't there, it is likely that pixelBytes is an array of size equal to the total number of colours times 4, which is the same thing as the number of total bytes used to represent the colours (since each colour is stored in 4 bytes)
A typical 32 bit pixel consists of four channels, alpha, red, green and blue.
My guess is that pixelbytes is a bytebuffer of these, so:
pixelbuffer[0] = r
pixelbuffer[1] = g
pixelbuffer[2] = b
pixelbuffer[3] = a
as your code says.
On each iteration, it adds four bytes (8 bit * 4 = 32 bit) to the counter, equaling the offset to the next 32bit pixel. The individual components can be accessed through a byte offset (i + <0-3>).

PGMidi changing pitch sendBytes example

I'm trying the second day to send a midi signal. I'm using following code:
int pitchValue = 8191 //or -8192;
int msb = ?;
int lsb = ?;
UInt8 midiData[] = { 0xe0, msb, lsb};
[midi sendBytes:midiData size:sizeof(midiData)];
I don't understand how to calculate msb and lsb. I tried pitchValue << 8. But it's working incorrect, When I'm looking to events using midi tool I see min -8192 and +8064 max. I want to get -8192 and +8191.
Sorry if question is simple.
Pitch bend data is offset to avoid any sign bit concerns. The maximum negative deviation is sent as a value of zero, not -8192, so you have to compensate for that, something like this Python code:
def EncodePitchBend(value):
''' return a 2-tuple containing (msb, lsb) '''
if (value < -8192) or (value > 8191):
raise ValueError
value += 8192
return (((value >> 7) & 0x7F), (value & 0x7f))
Since MIDI data bytes are limited to 7 bits, you need to split pitchValue into two 7-bit values:
int msb = (pitchValue + 8192) >> 7 & 0x7F;
int lsb = (pitchValue + 8192) & 0x7F;
Edit: as #bgporter pointed out, pitch wheel values are offset by 8192 so that "zero" (i.e. the center position) is at 8192 (0x2000) so I edited my answer to offset pitchValue by 8192.

Scaling a number between two values

If I am given a floating point number but do not know beforehand what range the number will be in, is it possible to scale that number in some meaningful way to be in another range? I am thinking of checking to see if the number is in the range 0<=x<=1 and if not scale it to that range and then scale it to my final range. This previous post provides some good information, but it assumes the range of the original number is known beforehand.
You can't scale a number in a range if you don't know the range.
Maybe what you're looking for is the modulo operator. Modulo is basically the remainder of division, the operator in most languages is is %.
0 % 5 == 0
1 % 5 == 1
2 % 5 == 2
3 % 5 == 3
4 % 5 == 4
5 % 5 == 0
6 % 5 == 1
7 % 5 == 2
...
Sure it is not possible. You can define range and ignore all extrinsic values. Or, you can collect statistics to find range in run time (i.e. via histogram analysis).
Is it really about image processing? There are lots of related problems in image segmentation field.
You want to scale a single random floating point number to be between 0 and 1, but you don't know the range of the number?
What should 99.001 be scaled to? If the range of the random number was [99, 100], then our scaled-number should be pretty close to 0. If the range of the random number was [0, 100], then our scaled-number should be pretty close to 1.
In the real world, you always have some sort of information about the range (either the range itself, or how wide it is). Without further info, the answer is "No, it can't be done."
I think the best you can do is something like this:
int scale(x) {
if (x < -1) return 1 / x - 2;
if (x > 1) return 2 - 1 / x;
return x;
}
This function is monotonic, and has a range of -2 to 2, but it's not strictly a scaling.
I am assuming that you have the result of some 2-dimensional measurements and want to display them in color or grayscale. For that, I would first want to find the maximum and minimum and then scale between these two values.
static double[][] scale(double[][] in, double outMin, double outMax) {
double inMin = Double.POSITIVE_INFINITY;
double inMax = Double.NEGATIVE_INFINITY;
for (double[] inRow : in) {
for (double d : inRow) {
if (d < inMin)
inMin = d;
if (d > inMax)
inMax = d;
}
}
double inRange = inMax - inMin;
double outRange = outMax - outMin;
double[][] out = new double[in.length][in[0].length];
for (double[] inRow : in) {
double[] outRow = new double[inRow.length];
for (int j = 0; j < inRow.length; j++) {
double normalized = (inRow[j] - inMin) / inRange; // 0 .. 1
outRow[j] = outMin + normalized * outRange;
}
}
return out;
}
This code is untested and just shows the general idea. It further assumes that all your input data is in a "reasonable" range, away from infinity and NaN.

Resources