Recode or compute age categories based by gender - spss

I'm a complete spss-novice and I can't figure it out. Google doesn't give any answers either (or I don't know how to google this question.. that's possible too).
I have to make a new variable based on two variables:
Gender (0=male, 1=female)
and
oudad (1=age 18-29; 2=age 30-54; 3=age 55-89; 4=deselect 99).
The new variable has to have six categories:
1. 18-29 male
2. 18-29 female
3. 30-54 male
4. 30-54 female
.. and so on.
I think I have to do something with either compute or recode into different variables, but can't figure out what to do.
Who can help me?

I don't know if it's the 'correct' way to do it, but this is the way I got it done. Advise on how to improve it, is still appreciated :)
IF (gender = 0 & agegroup =1) GenAge=1.
IF (gender = 0 & agegroup =2) GenAge=2.
IF (gender = 0 & agegroup =3) GenAge=3.
IF (gender = 1 & agegroup =1) GenAge=4.
IF (gender = 1 & agegroup =2) GenAge=5.
IF (gender = 1 & agegroup =3) GenAge=6.
EXECUTE.
VALUE LABELS GenAge
1 'Young man'
2 'Middle-aged man'
3 'Old man'
4 'Young woman'
5 'Middle-aged woman'
6 'Old woman'.

Look up the DO IF and/or IF command.
DO IF (Gender = 0 /* Male*/ AND Age = 1 /* 18 -29 */).
COMPUTE GenAge=1.
ELSE IF (Gender = 1 /* Female */ AND Age = 1 /* 18 -29 */).
COMPUTE GenAge=2.
ELSE IF (Gender = 0 /* Male */ AND Age = 2 /* 30 - 54 */).
COMPUTE GenAge=3.
ELSE IF (Gender = 1 /* Female */ AND Age = 2 /* 30 - 54 */).
COMPUTE GenAge=4.
ELSE IF (Gender = 0 /* Male */ AND Age = 3 /* 55 - 89 */).
COMPUTE GenAge=5.
ELSE IF (Gender = 1 /* Female */ AND Age = 3 /* 55 - 89 */).
COMPUTE GenAge=6.
END IF.
The content between each pair of /* & */ are simply to help make the code more readable and apparent what the code represents and so entirely optional.
Instead of series of IF statements in this manner (which could be even more cumbersome with a large number of categories), I would typically opt for coding something like this in an alternative manner such as follows:
RECODE Gender (0=2) (ELSE=COPY).
VALUE LABELS Gender 1 "Female" 2 Male".
COMPUTE GenAge=SUM(Gender*10, Age).
VALUE LABELS GenAge.
11 "Female 18 - 29"
12 "Female 30 - 54"
13 "Female 55 - 89"
21 "Male 18 - 29"
22 "Male 30 - 54"
23 "Male 55 - 89".
For categorical variables of this nature it is typically irrelevant the code that is assigned to it so I would always prefer a solution which involves writing as little code as possible and that too not being dependent on the data itself too. If the order is of importance you could always choose to have Age represented by the tenth unit integer and Gender the single unit integer.

Related

Filter the first n cases in SPSS based on condition

I have a database in SPSS structured like the following table:
ID
Gender
Age
Var1
Var...
1
0
7
3
...
2
1
8
4
...
3
1
9
5
...
4
1
9
2
...
I want to select only the first n (e.g.: 150) cases, where Gender = 1 and Age = 9, so in the table above the 3. and 4. case. How can I do it? Thanks!
compute filter_ = $sysmis.
compute counter_ = 0.
if $casenum=1 and (Gender = 1 and Age = 9) counter_ =1 .
do if $casenum <> 1.
if ~(Gender = 1 and Age = 9) counter_ = lag(counter).
if (Gender = 1 and Age = 9) counter_ = lag(counter) +1.
end if.
compute filter_ = (Gender = 1 and Age = 9 and counter<= 150).
execute.
I am not sure if this is the most efficient way, but it gets the job done. We use the counter_ variable to assign an order number for each record which satisfies the condition ("counting" records with meet the criteria, from the top of the file downwards). Then create a filter of the first 150 such records.
The below will select the first 150 cases where gender=1 AND age=9 (assuming 150 cases meet that criteria).
N 150.
SELECT IF (Gender=1 AND Age=9).
EXE .
Flipping the order of N and SELECT IF () would yield the same result. You can read more about N in the IBM documentation

Group by start and end date or join multiple columns in Power Query

I have an employees table with mutations to their contracts
EmpID Start End Function Hours SalesPercentage
1 01-01-2020 31-12-2020 FO Desk 40 1
1 01-01-2020 31-01-2021 FO Desk 32 1
1 01-02-2021 FO Desk 32 0.50
2 01-01-2021 31-01-2021 BO 32 0
2 01-02-2021 BO/FO 32 .25
For dynamic calculation of the amount of emplyees and their sales percentages I need to turn this into a tabel with an entry per month:
Year Month EmpID Hours SalesPercentage
2020 1 1 40 1
2020 2 1 40 1
..
2020 12 1 40 1
2021 1 1 32 1
2021 1 2 32 0
2021 2 1 32 0.50
2021 2 2 32 0.25
I have a simple Year Month table that I would like to append the mutation data to, but joining on multiple columns is not possible as far as I can tell. Is there a way around this?
Try this below
It generates a list of all year/month combinations for each row, then expands it and removes extra columns
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{ {"Start", type date}, {"End", type date}}),
#"Added Custom" = Table.AddColumn(
#"Changed Type",
"newcol",
each
let
begin = Date.StartOfMonth([Start]),
End2 = if [End] = null then [Start] else [End]
in
List.Accumulate(
{0..(Date.Year(End2)-Date.Year([Start]))*12+(Date.Month(End2)-Date.Month([Start]))},
{},
(s,c) => s&{Date.AddMonths(begin,c)}
)
),
#"Expanded newcol" = Table.ExpandListColumn(#"Added Custom", "newcol"),
#"Added Custom2" = Table.AddColumn(#"Expanded newcol", "Year", each Date.Year([newcol])),
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "Month", each Date.Month([newcol])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom3",{"Start", "End", "Function", "newcol"})
in #"Removed Columns"

Game Engine Collison Bitmask... Why 0x01 etc?

Coming across this situation both in Sprite Kit (iOS Development) and in Cocos2d-x (which I know was pretty much the inspiration for Sprite Kit, hence why they use a lot of the same tools), I finally decided to figure out why this happens:
When using a physic engine, I create a sprite, and add a physicsBody to it. For the most part, I understand how to set the category, collision, and contact bitmasks, and how they work. The problem is the actual bitmask number:
SpriteKit:
static const uint32_t missileCategory = 0x1 << 0;
sprite.physicsBody.categoryBitMask = missileCategory;
Cocos2D-X:
sprite->getPhysicsBody()->setCategoryBitmask(0x01); // 0001
I'm totally confused as to why I would write 0x01 or 0x1 << 0 in either case. I somewhat get that they're using hex, and it has something to do with 32-bit integers. And as far as I've been able to google, 0x01 is 0001 in binary which is 1 in decimal. And 0x02 is 0010 in binary which is 2 in decimal. Okay, so there are these conversions, but why in the world would I use them for something simple like categories?
As far as my logic goes, if I have lets say a player category, an enemy category, a missile category, and a wall category, that's just 4 categories. Why not use strings for the category? Or even just binary numbers that any non-CS person would understand like 0,1,2, and 3?
And finally, I'm confused why there are 32 different categories available? I thought a 32-bit integer had numbers 0-some billion number (unsigned of course). So why do I not have billions of different possible categories?
Is there some sort of optimization I am not understanding? Or this is just an old convention they use, but is not needed? Or is there something going on that someone with only 2 semesters of college course CS training wouldn't understand?
The reason for the bitmasks is that it enables you / the program to easily and very quickly compute wether a collision between two objects occurs or does not occur. Therefore: yes it is some sort of optimization.
Assuming we have the three categories
missile 0x1 << 0
player 0x1 << 1
wall 0x1 << 2
Now we have a Player instance, its category is set to player. Its collision bitmask is set to missile | player | wall (+ instead of | works too) since we want to be able to collide with all three types: other players, the level walls and the bullets / missiles flying around.
Now we have a Missile with category set to missile and collision bitmask set to player | wall: it does not collide with other missiles but hits players and walls.
If we now want to evaluate wether two objects can collide with each other we take the category bitmask of the first one and the collision bitmask of the second one and simply & them:
The setup described above looks like the following in code:
let player : UInt8 = 0b1 << 0 // 00000001 = 1
let missile : UInt8 = 0b1 << 1 // 00000010 = 2
let wall : UInt8 = 0b1 << 2 // 00000100 = 4
let playerCollision = player | missile | wall // 00000111 = 7
let missileCollision = player | wall // 00000101 = 5
The subsequent reasoning is basically:
if player & missileCollision != 0 {
print("potential collision between player and missile") // prints
}
if missile & missileCollision != 0 {
print("potential collision between two missiles") // does not print
}
We are using some bit arithmetics here, each bit represents a category.
You could simply enumerate the bitmasks 1,2,3,4,5... but then you could not do any math on them. Because you do not know if a 5 as category bitmask is really a category 5 or it was an object of both categories 1 and 4.
However using only bits we can do just that: the only representation in terms of powers of 2 of a 7 is 4 + 2 + 1: therefore whatever object posses collision bitmask 7 collides with category 4, 2 and 1. And the one with bitmask 5 is exactly and only a combination of category 1 and 4 - there is no other way.
Now since we are not enumerating - each category uses one bit and the regular integer has only 32 (or 64) bits we can only have 32 (or 64) categories.
Take a look at the following and a bit more extensive code which demonstrates how the masks are used in a more general term:
let playerCategory : UInt8 = 0b1 << 0
let missileCategory : UInt8 = 0b1 << 1
let wallCategory : UInt8 = 0b1 << 2
struct EntityStruct {
var categoryBitmask : UInt8
var collisionBitmask : UInt8
}
let player = EntityStruct(categoryBitmask: playerCategory, collisionBitmask: playerCategory | missileCategory | wallCategory)
let missileOne = EntityStruct(categoryBitmask: missileCategory, collisionBitmask: playerCategory | wallCategory)
let missileTwo = EntityStruct(categoryBitmask: missileCategory, collisionBitmask: playerCategory | wallCategory)
let wall = EntityStruct(categoryBitmask: wallCategory, collisionBitmask: playerCategory | missileCategory | wallCategory)
func canTwoObjectsCollide(first:EntityStruct, _ second:EntityStruct) -> Bool {
if first.categoryBitmask & second.collisionBitmask != 0 {
return true
}
return false
}
canTwoObjectsCollide(player, missileOne) // true
canTwoObjectsCollide(player, wall) // true
canTwoObjectsCollide(wall, missileOne) // true
canTwoObjectsCollide(missileTwo, missileOne) // false
The important part here is that the method canTwoObjectsCollide does not care about the type of the objects or how many categories there are. As long as you stick with bitmasks that is all you need to determine wether or not two objects can theoretically collide (ignoring their positions, which is a task for another day).
luk2302's answer is great, but just to go a bit further and in other directions...
Why hex notation? (0x1 << 2 etc)
Once you know that bit positions are the important part, it is (as mentioned in comments) just a matter of style/readability. You could just as well do:
let catA = 0b0001
let catB = 0b0010
let catC = 0b0100
But binary literals like that are (as far as Apple tools are concerned) new to Swift and not available in ObjC.
You could also do:
static const uint32_t catA = 1 << 0;
static const uint32_t catB = 1 << 1;
static const uint32_t catC = 1 << 2;
or:
static const uint32_t catA = 1;
static const uint32_t catB = 2;
static const uint32_t catC = 4;
But, for historical/cultural reasons, it's become common convention among programmers to use hexadecimal notation as a way of reminding oneself/other readers of your code that a particular integer literal is significant more for its bit pattern than its absolute value. (Also, for the second C example you have to remember which bit has which place value, whereas with << operator or binary literals you can emphasize the position.)
Why bit patterns? Why not ___?
Using bit patterns / bit masks is a performance optimization. To check for collisions, a physics engine must examine every pair of objects in the world. Because it's pairwise, the performance cost is quadratic: if you have 4 objects, you have 4*4 = 16 possible collisions to check... 5 objects is 5*5 = 25 possible conditions, etc. You can cut that list down with some obvious exclusions (no worries about an object colliding with itself, A collides with B is the same as B collides with A, etc), but the growth is still proportional to a quadratic; that is, for n objects, you have O(n2) possible collisions to check. (And remember, we're counting total objects in the scene, not categories.)
Many interesting physics games have a lot more than 5 total objects in the scene, and run at 30 or 60 frames per second (or at least want to). That means the physics engine has to check all those possible collision pairs in 16 milliseconds. Or preferably, much less than 16 ms, because it still has other physics-y stuff to do before/after finding collisions, and the game engine needs time to render, and you probably want time for your game logic in there, too.
Bit mask comparisons are very fast. Something like the mask comparison:
if (bodyA.categoryBitMask & bodyB.collisionBitMask != 0)
...is one of the quickest things you can ask an ALU to do — like one or two clock cycles fast. (Anyone know where to track down actual cycles per instruction figures?)
By contrast, string comparison is an algorithm in itself, requiring a lot more time. (Not to mention some easy way to have those strings express the combinations of categories that should result in collisions.)
A challenge
Since bit masks are a performance optimization, they might as well be a (private) implementation detail. But most physics engines, including SpriteKit's, leave them as part of the API. It'd be nicer to have a way of saying "these are my categories, these are how they should interact" at a high level, and let someone else handle the details of translating that description into bit masks. Apple's DemoBots sample code project appears to have one idea for simplifying such things (see ColliderType in the source)... feel free to use it design your own.
To answer your specific question
"why there are 32 different categories available? I thought a 32-bit
integer had numbers 0-some billion number (unsigned of course). So why
do I not have billions of different possible categories?"
The answer is that the category is always treated as a 32-digit bit mask of which ONLY ONE bit should be set. So these are the valid values:
00000000000000000000000000000001 = 1 = 1 << 0
00000000000000000000000000000010 = 2 = 1 << 1
00000000000000000000000000000100 = 4 = 1 << 2
00000000000000000000000000001000 = 8 = 1 << 3
00000000000000000000000000010000 = 16 = 1 << 4
00000000000000000000000000100000 = 32 = 1 << 5
00000000000000000000000001000000 = 64 = 1 << 6
00000000000000000000000010000000 = 128 = 1 << 7
00000000000000000000000100000000 = 256 = 1 << 8
00000000000000000000001000000000 = 512 = 1 << 9
00000000000000000000010000000000 = 1024 = 1 << 10
00000000000000000000100000000000 = 2048 = 1 << 11
.
.
.
10000000000000000000000000000000 = 2,147,483,648 = 1 << 31
So there are 32 diffeeent categories available. Your categoryBitMask however can have multiple bits sets so can indeed be any number from 1 to whatever the maximum of UInt32 is. for example, in an arcade game you might have categories such as:
00000000000000000000000000000001 = 1 = 1 << 0 //Human
00000000000000000000000000000010 = 2 = 1 << 1 //Alien
00000000000000000000000000000100 = 4 = 1 << 2 //Soldier
00000000000000000000000000001000 = 8 = 1 << 3 //Officer
00000000000000000000000000010000 = 16 = 1 << 4 //Bullet
00000000000000000000000000100000 = 32 = 1 << 5 //laser
00000000000000000000000001000000 = 64 = 1 << 6 //powershot
so a human civilian might have a categoryBitMask of 1, a human soldier 5 (1 + 4), an alien officer 6, a normal bullet 16, a missile 80 (16 + 64), mega-death ray 96 etc etc.

Random values with different weights

Here's a question about entity framework that has been bugging me for a bit.
I have a table called prizes that has different prizes. Some with higher and some with lower monetary values. A simple representation of it would be as such:
+----+---------+--------+
| id | name | weight |
+----+---------+--------+
| 1 | Prize 1 | 80 |
| 2 | Prize 2 | 15 |
| 3 | Prize 3 | 5 |
+----+---------+--------+
Weight is this case is the likely hood I would like this item to be randomly selected.
I select one random prize at a time like so:
var prize = db.Prizes.OrderBy(r => Guid.NewGuid()).Take(1).First();
What I would like to do is use the weight to determine the likelihood of a random item being returned, so Prize 1 would return 80% of the time, Prize 2 15% and so on.
I thought that one way of doing that would be by having the prize on the database as many times as the weight. That way having 80 times Prize 1 would have a higher likelihood of being returned when compared to Prize 3, but this is not necessarily exact.
There has to be a better way of doing this, so i was wondering if you could help me out with this.
Thanks in advance
Normally I would not do this in database, but rather use code to solve the problem.
In your case, I would generate a random number within 1 to 100. If the number generated is between 1 to 80 then 1st one wins, if it's between 81 to 95 then 2nd one wins, and if between 96 to 100 the last one win.
Because the random number could be any number from 1 to 100, each number has 1% of chance to be hit, then you can manage the winning chance by giving the range of what the random number falls into.
Hope this helps.
Henry
This can be done by creating bins for the three (generally, n) items and then choose a selected random to be dropped in one of those bins.
There might be a statistical library that could do this for you i.e. proportionately select a bin from n bins.
A solution that does not limit you to three prizes/weights could be implemented like below:
//All Prizes in the Database
var allRows = db.Prizes.ToList();
//All Weight Values
var weights = db.Prizes.Select(p => new { p.Weight });
//Sum of Weights
var weightsSum = weights.AsEnumerable().Sum(w => w.Weight);
//Construct Bins e.g. [0, 80, 95, 100]
//Three Bins are: (0-80],(80-95],(95-100]
int[] bins = new int[weights.Count() + 1];
int start = 0;
bins[start] = 0;
foreach (var weight in weights) {
start++;
bins[start] = bins[start - 1] + weight.Weight;
}
//Generate a random number between 1 and weightsSum (inclusive)
Random rnd = new Random();
int selection = rnd.Next(1, weightsSum + 1);
//Assign random number to the bin
int chosenBin = 0;
for (chosenBin = 0; chosenBin < bins.Length; chosenBin++)
{
if (bins[chosenBin] < selection && selection <= bins[chosenBin + 1])
{
break;
}
}
//Announce the Prize
Console.WriteLine("You have won: " + allRows.ElementAt(chosenBin));

Extracting sampled Time Points

I have a matlab Curve from which i would like to plot and find Concentration values at 17 different time samples
Following is the curve from which i would like to extract Concentration values at 17 different time points
following are the time points in minutes
t = 0,0.25,0.50,1,1.5,2,3,4,9,14,19,24,29,34,39,44,49. minutes samples
Following is the Function which i have written to plot the above graph
function c_t = output_function_constrainedK2(t, a1, a2, a3,b1,b2,b3,td, tmax,k1,k2,k3)
K_1 = (k1*k2)/(k2+k3);
K_2 = (k1*k3)/(k2+k3);
DV_free= k1/(k2+k3);
c_t = zeros(size(t));
ind = (t > td) & (t < tmax);
c_t(ind)= conv(((t(ind) - td) ./ (tmax - td) * (a1 + a2 + a3)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
ind = (t >= tmax);
c_t(ind)= conv((a1 * exp(-b1 * (t(ind) - tmax))+ a2 * exp(-b2 * (t(ind) - tmax))) + a3 * exp(-b3 * (t(ind) - tmax)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
plot(t,c_t);
axis([0 50 0 1400]);
xlabel('Time[mins]');
ylabel('concentration [Mbq]');
title('Model :Constrained K2');
end
If possible, Kindly please suggest me some idea how i could possibly alter the above function so that i can come up with concentration values at 17 different time points stated above
Following are the input values that i have used to come up with the curve
output_function_constrainedK2(0:0.1:50,2501,18500,65000,0.5,0.7,0.3,3,8,0.014,0.051,0.07)
This will give you concentration values at the time points you wanted. You will have to put this inside the output_function_constrainedK2 function so that you can access the variables t and c_t.
T=[0 0.25 0.50 1 1.5 2 3 4 9 14 19 24 29 34 39 44 49];
concentration=interp1(t,c_t,T)

Resources