SPSS counting changes between variables - spss

I have a dataset that has three variables which indicate a category of event at three time points (dispatch, beginning, end). I want to establish the number of cases where (a) the category is the same for all three time points (b) those which have changed at time point 2 (beginning) and (c) those which have changed at time point 3 (end).
Can anyone recommend some syntax or a starting point?

To measure a change (non-equivalent) against T0 (Time zero or in your case Dispatch), wouldn't you simply check for equivalence between respective variables?:
DATA LIST FREE /ID T0 T1 T2.
BEGIN DATA.
1 1 1 1.
2 1 1 0.
3 1 0 1.
4 0 1 1.
5 1 0 0.
6 0 1 0.
7 0 0 1.
8 0 0 0.
END DATA.
COMPUTE ChangeT1=T0<>T1.
COMPUTE ChangeT2=T0<>T2.
To check all the values are the same across all three variables would be just (given you have string variables else otherwise you could do this differently if working with numeric variables such as Standard deviation):
COMPUTE CheckNoChange=T0=T1 & T0=T2.

Related

Why is BFS more efficient than DFS in this scenario?

tldr;
You start at 3 and want to end up at 4. There is always a guaranteed path. You can only hop onto 1's. You move like a knight, m units in one direction, and n units in another, every time. What is the least number of hops to get to your destination.
Input:
1 2
1 0 1 0 1
3 0 2 0 4
0 1 2 0 0
0 0 0 1 0
You start at 3 and hop to the 1 at the middle top, then to 4. Each hop is 1 unit in 1 direction and 2 units in another. Thus, the answer for this case is 2. Why is BFS better than DFS in this situation?
Breadth-first search is guaranteed to find the shortest path from the start to the goal, while depth-first search isn’t.

hypothesis function space in decision tree

I am reading the book "Artificial Intelligence" by Stuart Russell and Peter Norvig (Chapter 18). The following paragraph is from the decision trees context.
For a wide variety of problems, the decision tree format yields a
nice, concise result. But some functions cannot be represented
concisely. For example, the majority function, which returns true if
and only if more than half of the inputs are true, requires an
exponentially large decision tree.
In other words, decision trees are good for some kinds of functions
and bad for others. Is there any kind of representation that is
efficient for all kinds of functions? Unfortunately, the answer is no.
We can show this in a general way. Consider the set of all Boolean
functions on "n" attributes. How many different functions are in this
set? This is just the number of different truth tables that we can
write down, because the function is defined by its truth table.
A truth table over "n" attributes has 2^n rows, one for each
combination of values of the attributes.
We can consider the “answer” column of the table as a 2^n-bit number
that defines the function. That means there are (2^(2^n)) different
functions (and there will be more than that number of trees, since
more than one tree can compute the same function). This is a scary
number. For example, with just the ten Boolean attributes of our
restaurant problem there are 2^1024 or about 10^308 different
functions to choose from.
What does author mean by "answer" column of the table as a 2^n-bit number that defines the function?
How did author derive (2^(2^n)) different functions?
Please elaborate on above question, preferably with simple example, such as n = 3.
Consider a general truth table for a 3-input function, where the result for each triple is also a Boolean (1 or 0), represented by variables i through 'p':
A B C f(a,b,c)
0 0 0 i
0 0 1 j
0 1 0 k
0 1 1 l
1 0 0 m
1 0 1 n
1 1 0 o
1 1 1 p
We can now represent any function on three variables as an 8-bit number, ijklmnop. For instance, and is 00000001; or is 01111111; one_hot (exactly one input True) is 01101000.
For 3 variables, you have 2^3 bits in the "answer", the complete function definition. Since there are 8 bits in the "answer", there are 2^8 possible functions we can define.
Does that outline the field of comprehension for you?
More detail on an example function
You simply (once you see the pattern) make the eight bits correspond to the entires in the table. For instance, the table for one-hot looks like this:
A B C f(a,b,c)
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 0
Reading down the "answer" column, labeled f(a,b,c), you get the 8-bit sequence 01101000. That 8-bit number is sufficient to completely define the function: the rows listing all the combinations of a, b, c are in a fixed (numerical) sequence.
You can write any such function in a template format:
def and(a, b, c):
and_def = '00000001'
index = 4*a + 2*b + 1*c
return and_def[index]
Now, if we generalize this to any 3-input binary function:
def_bin_func(a, b, c, func_def)
return func_def[4*a + 2*b + 1*c]
If you wish, you can further generalize the template for a list of inputs: concatenate the bits and use that integer as the index into the func_def string.
Does that clear it up?

Performing exact match when comparing variables in SPSS Statistics

I'm wondering if there's a way for me to perform an exact match compare in SPSS. Currently, using the following will return system missing (null) in cases where one variable is sysmis:
compute var1_comparison = * Some logic here.
compute var1_check = var1 = var1_comparison.
The results look like this (hypens representing null values):
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 -
4 1 1 1
5 - - -
What I want is this:
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 0
4 1 1 1
5 - - 1
Is this possible using just plain SPSS syntax? I'm also open to using the Python extension, though I'm not as familiar with it.
Here's a slightly different approach, using temporary scratch variables (prefixed by a hash (#)):
recode var1 var1_comparison (sysmis=-99) (else=copy) into #v1 #v2.
compute Check=(#v1 = #v2).
This is to recreate your example:
data list list/ID var1 var1_comparison.
begin data
1, 3, 3
2 , 4, 3
3, , 2
4, 1, 1
5, ,
end data.
Now you have to deal separately with the situation where both values are missing, and then complete the calculation in all other situations:
do if missing(var1) or missing(var1_comparison).
compute var1_check=(missing(var1) and missing(var1_comparison)).
else.
compute var1_check = (var1 = var1_comparison).
end if.

What is the relation between address lines and memory?

These are my assignments:
Write a program to find the number of address lines in an n Kbytes of memory. Assume that n is always to the power of 2.
Sample input: 2
Sample output: 11
I don't need specific coding help, but I don't know the relation between address lines and memory.
To express in very easy terms, without any bus-multiplexing, the number of bits required to address a memory is the number of lines (address or data) required to access that memory.
Quoting from the Wikipedia article,
a system with a 32-bit address bus can address 232 (4,294,967,296) memory locations.
for a simple example, consider this, you have 3 address lines (A, B, C), so the values which can be formed using 3 bits are
A B C
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
Total 8 values. So using ABC, you can access any of those eight values, i.e., you can reach any of those memory addresses.
So, TL;DR, the simple relationship is, with n number of lines, we can represent 2n number of addresses.
An address line usually refers to a physical connection between a CPU/chipset and memory. They specify which address to access in the memory. So the task is to find out how many bits are required to pass the input number as an address.
In your example, the input is 2 kilobytes = 2048 = 2^11, hence the answer 11. If your input is 64 kilobytes, the answer is 16 (65536 = 2^16).

Counting data using SPSS syntax

I have the following SPSS syntax to count using a conditional
DATASET ACTIVATE Conjunto_de_datos1.
DO IF (((p7_1 = 1) | (p7_2 = 1)) & (periodo = 2)).
COUNT noque_o_noria=p7_2 p7_1(1).
END IF.
EXECUTE.
the data is the folowing
p7_1 p7_2 periodo
1 1 2
1 0 2
1 1 2
1 1 1
1 1 1
0 1 2
The problem I have is that in the new column each row that meet the rule is given automatically the value 2, and the ones that don't meet the rule are lost values (empty).
What should I add to the code above to retrieve me 1 when it meets the rule and 0 when not?
You don't need so much syntax to do that. Just
compute noque_o_noria=(p7_2 = 1 or p7_1 = 1) and periodo = 2.
will do.
There is no point for the COUNT command, so you can use a COMPUTE noque_o_noria = 1 instead and then specify an ELSE condition, e.g.
DO IF (((p7_1 = 1) | (p7_2 = 1)) & (periodo = 2)).
COMPUTE noque_o_noria = 1.
ELSE.
COMPUTE noque_o_noria = 0.
END IF.
I suspect that the periodo variable was previously defined, and the DO IF is leaving the old values unchanged.
If the variable is new, then cases bypassed by DO IF will have the sysmis value. For cases that are processed by COUNT, the variable is initialized to zero for each case.

Resources