Filter the first n cases in SPSS based on condition - spss

I have a database in SPSS structured like the following table:
ID
Gender
Age
Var1
Var...
1
0
7
3
...
2
1
8
4
...
3
1
9
5
...
4
1
9
2
...
I want to select only the first n (e.g.: 150) cases, where Gender = 1 and Age = 9, so in the table above the 3. and 4. case. How can I do it? Thanks!

compute filter_ = $sysmis.
compute counter_ = 0.
if $casenum=1 and (Gender = 1 and Age = 9) counter_ =1 .
do if $casenum <> 1.
if ~(Gender = 1 and Age = 9) counter_ = lag(counter).
if (Gender = 1 and Age = 9) counter_ = lag(counter) +1.
end if.
compute filter_ = (Gender = 1 and Age = 9 and counter<= 150).
execute.
I am not sure if this is the most efficient way, but it gets the job done. We use the counter_ variable to assign an order number for each record which satisfies the condition ("counting" records with meet the criteria, from the top of the file downwards). Then create a filter of the first 150 such records.

The below will select the first 150 cases where gender=1 AND age=9 (assuming 150 cases meet that criteria).
N 150.
SELECT IF (Gender=1 AND Age=9).
EXE .
Flipping the order of N and SELECT IF () would yield the same result. You can read more about N in the IBM documentation

Related

How to return a count of fields with a given value in a record?

I have a database table with the following fields :
---------------------
FIELDS : | H1 | H2 | H3 | H4
---------------------
VALUES : | A | B | A | C
---------------------
For a given record (row), I would like to count the number of fields with a value of A. In the above, for example, there are two fields with a value of A, so the expected result would be : 2
How can I achieve this?
I am trying to answer the question from a database point of view.
You have a table with one or more rows and every row has in the four columns either an 'A' or something else. For a given row (or for many rows) you want to get the number of columns that have an 'A' in it.
As one commenter pointed out you can't sum letters but you can check whether or not a value is the one you are looking for and then count this occurence as a 1 or 0. Finally sum those values and return the sum.
SELECT (CASE H1 WHEN 'A' THEN 1 ELSE 0 END) +
(CASE H2 WHEN 'A' THEN 1 ELSE 0 END) +
(CASE H3 WHEN 'A' THEN 1 ELSE 0 END) +
(CASE H4 WHEN 'A' THEN 1 ELSE 0 END) AS number_of_a
FROM name_of_your_table;
For your example row this will return:
NUMBER_OF_A
===========
2
If you have more than one row you'll get the number of As for every row.
I test this it work Thanx for help.
SELECT count(H1) + count(H2) + count(H3) + count(H4) + count(H5) +
count(H6) + count(H7) + count(H8) as TOT
from Table T
where T.H1 = 'A' or T.H2 = 'A' or T.H3 = 'A' or T.H4 = 'A'
or T.H5 = 'A' or T.H6 = 'A' or T.H7 = 'A' or T.H8 = 'A'
group by T.ID
order by 1 DESC
Other solution ...

Torch: Concatenating tensors of different dimensions

I have a x_at_i = torch.Tensor(1,i) that grows at every iteration where i = 0 to n. I would like to concatenate all tensors of different sizes into a matrix and fill the remaining cells with zeroes. What is the most idiomatic way to this. For example:
x_at_1 = 1
x_at_2 = 1 2
x_at_3 = 1 2 3
x_at_4 = 1 2 3 4
X = torch.cat(x_at_1, x_at_2, x_at_3, x_at_4)
X = [ 1 0 0 0
1 2 0 0
1 2 3 0
1 2 3 4 ]
If you know n and assuming you have access to your x_at_i easily at each iteration I would try something like
X = torch.Tensor(n, n):zero()
for i = 1, n do
X[i]:narrow(1, 1, i):copy(x_at[i])
end

How to count the number of repetitions for every single number?

So I have a for loop that loops 100 times and each time it generates a random number from 1 to 100. For some statistics, I need to count how many times each number repeats. I have no idea how to make it other than manually.
One = 0
Two = 0
Three = 0
Four = 0
Five = 0
for i=1, 100 do
number = GetRandomNumber(1, 5, 1.5)
if number == 1 then
One = One + 1
elseif number == 2 then
Two = Two + 1
elseif number == 3 then
Three = Three + 1
elseif number == 4 then
Four = Four + 1
elseif number == 5 then
Five = Five + 1
end
end
This is how I currently count, but I don't want to manually type for every number. How can I make this simpler?
I do it as such:
number_counter, number = {}, 0
for i = 1, 100 do
number = GetRandomNumber(1, 5, 1.5)
if number_counter[number] then
number_counter[number] = number_counter[number] + 1
else
number_counter[number] = 1
end
end
This is, of course, assuming there are no half points (not sure what the 1.5 is for). Then you can just call number_counter[#] to see what its value is.

Random values with different weights

Here's a question about entity framework that has been bugging me for a bit.
I have a table called prizes that has different prizes. Some with higher and some with lower monetary values. A simple representation of it would be as such:
+----+---------+--------+
| id | name | weight |
+----+---------+--------+
| 1 | Prize 1 | 80 |
| 2 | Prize 2 | 15 |
| 3 | Prize 3 | 5 |
+----+---------+--------+
Weight is this case is the likely hood I would like this item to be randomly selected.
I select one random prize at a time like so:
var prize = db.Prizes.OrderBy(r => Guid.NewGuid()).Take(1).First();
What I would like to do is use the weight to determine the likelihood of a random item being returned, so Prize 1 would return 80% of the time, Prize 2 15% and so on.
I thought that one way of doing that would be by having the prize on the database as many times as the weight. That way having 80 times Prize 1 would have a higher likelihood of being returned when compared to Prize 3, but this is not necessarily exact.
There has to be a better way of doing this, so i was wondering if you could help me out with this.
Thanks in advance
Normally I would not do this in database, but rather use code to solve the problem.
In your case, I would generate a random number within 1 to 100. If the number generated is between 1 to 80 then 1st one wins, if it's between 81 to 95 then 2nd one wins, and if between 96 to 100 the last one win.
Because the random number could be any number from 1 to 100, each number has 1% of chance to be hit, then you can manage the winning chance by giving the range of what the random number falls into.
Hope this helps.
Henry
This can be done by creating bins for the three (generally, n) items and then choose a selected random to be dropped in one of those bins.
There might be a statistical library that could do this for you i.e. proportionately select a bin from n bins.
A solution that does not limit you to three prizes/weights could be implemented like below:
//All Prizes in the Database
var allRows = db.Prizes.ToList();
//All Weight Values
var weights = db.Prizes.Select(p => new { p.Weight });
//Sum of Weights
var weightsSum = weights.AsEnumerable().Sum(w => w.Weight);
//Construct Bins e.g. [0, 80, 95, 100]
//Three Bins are: (0-80],(80-95],(95-100]
int[] bins = new int[weights.Count() + 1];
int start = 0;
bins[start] = 0;
foreach (var weight in weights) {
start++;
bins[start] = bins[start - 1] + weight.Weight;
}
//Generate a random number between 1 and weightsSum (inclusive)
Random rnd = new Random();
int selection = rnd.Next(1, weightsSum + 1);
//Assign random number to the bin
int chosenBin = 0;
for (chosenBin = 0; chosenBin < bins.Length; chosenBin++)
{
if (bins[chosenBin] < selection && selection <= bins[chosenBin + 1])
{
break;
}
}
//Announce the Prize
Console.WriteLine("You have won: " + allRows.ElementAt(chosenBin));

Recode or compute age categories based by gender

I'm a complete spss-novice and I can't figure it out. Google doesn't give any answers either (or I don't know how to google this question.. that's possible too).
I have to make a new variable based on two variables:
Gender (0=male, 1=female)
and
oudad (1=age 18-29; 2=age 30-54; 3=age 55-89; 4=deselect 99).
The new variable has to have six categories:
1. 18-29 male
2. 18-29 female
3. 30-54 male
4. 30-54 female
.. and so on.
I think I have to do something with either compute or recode into different variables, but can't figure out what to do.
Who can help me?
I don't know if it's the 'correct' way to do it, but this is the way I got it done. Advise on how to improve it, is still appreciated :)
IF (gender = 0 & agegroup =1) GenAge=1.
IF (gender = 0 & agegroup =2) GenAge=2.
IF (gender = 0 & agegroup =3) GenAge=3.
IF (gender = 1 & agegroup =1) GenAge=4.
IF (gender = 1 & agegroup =2) GenAge=5.
IF (gender = 1 & agegroup =3) GenAge=6.
EXECUTE.
VALUE LABELS GenAge
1 'Young man'
2 'Middle-aged man'
3 'Old man'
4 'Young woman'
5 'Middle-aged woman'
6 'Old woman'.
Look up the DO IF and/or IF command.
DO IF (Gender = 0 /* Male*/ AND Age = 1 /* 18 -29 */).
COMPUTE GenAge=1.
ELSE IF (Gender = 1 /* Female */ AND Age = 1 /* 18 -29 */).
COMPUTE GenAge=2.
ELSE IF (Gender = 0 /* Male */ AND Age = 2 /* 30 - 54 */).
COMPUTE GenAge=3.
ELSE IF (Gender = 1 /* Female */ AND Age = 2 /* 30 - 54 */).
COMPUTE GenAge=4.
ELSE IF (Gender = 0 /* Male */ AND Age = 3 /* 55 - 89 */).
COMPUTE GenAge=5.
ELSE IF (Gender = 1 /* Female */ AND Age = 3 /* 55 - 89 */).
COMPUTE GenAge=6.
END IF.
The content between each pair of /* & */ are simply to help make the code more readable and apparent what the code represents and so entirely optional.
Instead of series of IF statements in this manner (which could be even more cumbersome with a large number of categories), I would typically opt for coding something like this in an alternative manner such as follows:
RECODE Gender (0=2) (ELSE=COPY).
VALUE LABELS Gender 1 "Female" 2 Male".
COMPUTE GenAge=SUM(Gender*10, Age).
VALUE LABELS GenAge.
11 "Female 18 - 29"
12 "Female 30 - 54"
13 "Female 55 - 89"
21 "Male 18 - 29"
22 "Male 30 - 54"
23 "Male 55 - 89".
For categorical variables of this nature it is typically irrelevant the code that is assigned to it so I would always prefer a solution which involves writing as little code as possible and that too not being dependent on the data itself too. If the order is of importance you could always choose to have Age represented by the tenth unit integer and Gender the single unit integer.

Resources