How to reference a particular row for an existing variable in SPSS syntax? - spss

I have 2 variables, one for raw p-values and another for adjusted p-values. I need to compute a new variable based on the values of these two variables. What I need to do isn't too complicated, but I have a hard time doing it in SPSS because I can't figure out how I can reference a particular row for an existing variable in SPSS syntax.
The first column lists raw p-values in ascending order. The next column lists adjusted p-values, but these adjusted p-values are still incomplete. I need to compare two adjacent p-values in the adjusted p-values column (e.g., row 1 and 2, row 2 and 3, row 3 and 4, and so forth), and take the p-values whichever is smaller in each of these comparisons and enter those p-values into the following column as values for a new variable.
However, that's not the end of the story. One more condition has to be met. That is, the new p-values have to be in the same order as the raw p-values. However, I cannot ensure this if I start the comparisons from the top row. You can see that (i') is greater than (h') and (g'), and (d') is greater than (c'), (b'), and (a') in the example below (picture).
In order to solve this issue, I would need to start the comparison of the adjusted p-values from the bottom. In addition, I would need to compare the adjusted p-values to the new p-values of one row below. One exception is that I can simply use the value of (a) as the value of (a') since the value of (a) should always be the greatest of all the p-values as a rule. Then, for (b') , I need to compare (b) and (a') and enter whichever is smaller as (b'). For (c'), I need to compare (c) and (b') and enter whichever is smaller as (c'), and so forth. By doing this way, (d') would be 0.911 and (i') would be 0.017.
Sorry for this long post, but I would really appreciate if I can get some help to do this task in SPSS.
Thank you in advance for your help.
Raw p-values | Adjusted p-values (Temporal)| New p-values (Final)
-------------|-----------------------------|---------------------
0.002 | 0.030 (i) | 0.025 (i')
0.003 | 0.025 (h) | 0.017 (h')
0.004 | 0.017 (g) | 0.017 (g')
0.005 | 0.028 (f) | 0.028 (f')
0.023 | 0.068 (e) | 0.068 (e')
0.450 | 1.061 (d) | 1.061 (d')
0.544 | 1.145 (c) | 0.911 (c')
0.850 | 0.911 (b) | 0.911 (b')
0.974 | 0.974 (a) | 0.974 (a')

Another tool that may be convenient is the SHIFT VALUES command. It can move one or more columns of data either forward or backward.
I wonder whether the purpose of this has to do with adjusting p values for multiple testing corrections as with Benjamin-Hochberg FDR or others similar. If that is the case, you might find the STATS PADJUST (Analyze > Descriptives > Calculate adjusted p values) extension command useful. It offers six adjustment methods. You can install it from the Utilities (pre-V24) or Extensions (V24+) menu.

To get you started, here are a few tools that can help you with this task:
The LAG function
you can compare values in this line and the previous one, for example, the following will compare the Pval in each line to the one in the previous one, and put the smaller of the two in the NewPval:
compute NewPVal=min(Pval, lag(Pval)).
If you want to do the same process only start from the bottom, you can easily sort your data in reverse order and do the same.
CREATE + LEAD
if you want to make comparisons to the next line instead of the previous line, you should first create a "lead" variable and then compare to it.
for example, the following syntax will create a new variable that for each line contains the value of Pval in the next line, and then chooses the smaller of the two for the NewPval:
create /LeadPval=LEAD(Pval 1).
compute NewPVal=min(Pval, LeadPval).
Using case numbers
You can use case numbers (line numbers) in calculations and in conditions. For example, the following syntax will let you make different calculations in the first line and the following ones:
if $casenum=1 NewPval=Pval.
if $casenum>1 NewPVal=min(Pval, lag(Pval)).

Related

Example of combinatorial FSM?

On the Wikipedia page of Finite State Machines it shows a graphic of the automata types:
I've never heard of combinational logic being included in the automata theory, normally just the Chomsky hierarchy, which stars with FSM. How then would combinational logic be written using a state machine?
For example, if we have an AND gate, I'd see it in a circuit diagram as something like:
______
A ------- | |
| AND |------- C
B ------- |______|
And the states would be: 1(A) & 1(B) --> 1(C), 1&0->0, 0&1->0, 0&0->0. But this involves two initial states rather than one, and also the input to a 'gate' is the combination of two inputs rather than one, so how would this be shown using a FSM? I suppose it could be possible doing something like the following -- with the input symbols being {0,1} and the output {0,1} like a Moore machine.
1 1
s0 ----> s2 -----> s3:1
| | 0
------> s3:0 --0,1--|
0 ^----------|
But this seems a bit useless to me so maybe I'm getting it wrong, what then would be a proper way to model Combinational logic in a state diagram?
Here would be a simpler way to diagram the above, where the Input and Output states are either ON (1) or OFF (0) to make it more intuitive.

Design a turing machine to accept {1^n : n is prime number}

Design a Turing Machine to accept {1^n: n is prime number}.
I have this homework to make a recognizer Turing Machine that will be accepted if the occurrences of 1 are equal to any prime number. As of now, I still got no idea how to find the solution related to this prime number.
How should I go about this?
Because we're making a Turing machine and we haven't explicitly said we care about performance, odds are we just care about showing that TMs can solve this problem - so, any solution, no matter how dumb, should suffice. What is a correct, if needlessly tedious, way to show that a number in unary format (e.g., 1^p) is a prime number? One way is to check whether the number of p's is evenly divisible by any number between 2 and p - 1, inclusive. This is actually pretty easy to do for a Turing machine. Since the problem doesn't tell us not to, we can make it even simpler by using a multi-tape Turing machine for our construction.
Let the input be on tape #1 and use tape #2 to record the current thing we are trying to divide the input by. Before we begin, we can verify that our p is greater than 2, as follows:
see if the current tape square is 1, if so, move right, else halt-reject since 0 is not prime
see if the current tape square is 1, if so, move right, else, halt-reject since 1 is not prime
see if the current tape square is 1, if so, reset the tape head and continue on with our division process, knowing p > 2; else, halt-accept since 2 is prime.
If we're continuing at this point, that means we've verified we are looking at the unary encoding of a number greater than 2. We need to do this because 2 is the first number for which we need to check divisibility, and we don't want to say 2 is composite since 2 divides it. At this stage, we can write 11 (unary 2) on tape #2. If you like, you can do this as you are resetting the tape head as mentioned above. Otherwise, you can use some new states specifically for that part of the setup.
We are now looking at a TM configuration like this:
#1111111111111111111111#
^
#11#
^
We want to see if the number represented on the second tape evenly divides the number represented on the first tape. To do this, we can "cross out" numbers on the first tape repeatedly, in groups the size of the second tape, until we run out of numbers on the first tape. If we run out in the middle of crossing out a whole group, then the number represented on the first tape is not evenly divisible by the number represented on the second tape, and we can proceed to check increasing numbers on the second tape. If we run out after having crossed out an entire group, then it is evenly divisible by a number other than 1 and itself, so we can halt-reject as the number is not prime. Processing our example would look like:
=> #1111111# => #x111111# => #xx11111# => #xx11111#
^ ^ ^ ^
#11# #11# #11# #11#
^ ^ ^ ^
=> #xx11111# => #xx11111# => #xx11111# => #xxx1111#
^ ^ ^ ^
#11# #11# #11# #11#
^ ^ ^ ^
=> #xxxx111# => ... reset => #xxxx111# => ... cross
^ tape 2 ^ off another
#11# back to #11# pair of 1s
^ head ... ^ ...
=> #xxxxxx1# => #xxxxxxx#
^ ^
#11# #11#
^ ^
At this stage we see a 1 on the second tape and a blank on the first tape; this means the number was not divisible by our current guess. If we had seen blank/blank, we could have halt-rejected immediately. Instead, we need to continue checking larger possible divisors. At this stage we need to:
reset the first tape head to the beginning of the input, replacing x's with 1's again.
add an extra 1 to the second tape head to increment its value, then reset the head to the beginning of its value.
repeat the divisibility check described above.
If we continue this process, we will eventually find a divisor of the number represented on the input tape, if that number is composite. As it is, however, we will currently halt-reject on prime numbers when the second tape increases to the same number as the input. We will then find that the prime number evenly divides itself. We need to check for this; a good place would be between steps 2 and 3 in the last set of 3 steps above, we can compare tapes #1 and #2 and see if they match exactly. this would be just like the divisibility check, but it would only cross off at most one copy of tape #2 from tape #1, and it would halt-accept if it got to blank/blank rather than halt-rejecting.
Obviously, there are a lot of details to fill out if you want to formally define the TM by giving its transition table. But, it can be done and a procedure like the one outlined here can be used to solve this problem. Again, this is not the most efficient way to solve this, but it is a way to solve it, which generally is good enough when looking for TMs for some problem.

AWK (or similar) - change 2 lines below the matching pattern

I have a problem that I think it's easiest to solve with awk but I wrapped my head around it.
Inside a file I have repeating output like this:
....
Name="BgpIpv4RouteConfig_XXX">
<Ipv4NetworkBlock id="13726"
StartIpList="x.y.z.t"
PrefixLength="30"
NetworkCount="10000"
... other output
then this block will repeat.
a)I want to match on BGPIpv4Route.*, then skip 2 lines (the "n" keyword of awk), then when reaching Prefix Length:
- either replace it with random (25,30)
or
- better but I guess harder (no idea came to mind for keeping track of what was used and looping among /25../30) -> first occurrence /25, second one /26...till /30, then rollback to /25
b) then next line with NetworkCount depending on the new value of PrefixCount calculate it as 65536 / 2^(32-Prefix Count)
eg: if PrefixCount on this occurrence was replaced with /25, then NetworkCount on the line following it = 65536 / 2 ^ 7 = 65536 / 128 = 512
I found some examples with inserting/changing a line after one that matched (or with a counter variable X lines below the match) but I got a bit confused with the value generation part and also with the changing of two lines where one is depending on the other.
Not sure I made any sense...my head is a bit overwhelmed with what I'm finding everywhere right now.
Thanks in advance!
this should do
$ awk 'BEGIN {q="\""; FS=OFS="="; n=split("25=26=27=28=29=30",ps)}
/BgpIpv4Route/ {c=c%n+1}
/PrefixLength/ {$2=q ps[c] q}
/NetworkCount/ {$2=q 65536/2^(32-ps[c]) q}1' file
perhaps minimize computation by changing to 2^(ps[c]-16)
If there are free standing PrefixLength and NetworkCount attributes perhaps you need to qualify them for each BgpIpv4Route context.

Multiset Partition Using Linear Arithmetic and Z3

I have to partition a multiset into two sets who sums are equal. For example, given the multiset:
1 3 5 1 3 -1 2 0
I would output the two sets:
1) 1 3 3
2) 5 -1 2 1 0
both of which sum to 7.
I need to do this using Z3 (smt2 input format) and "Linear Arithmetic Logic", which is defined as:
formula : formula /\ formula | (formula) | atom
atom : sum op sum
op : = | <= | <
sum : term | sum + term
term : identifier | constant | constant identifier
I honestly don't know where to begin with this and any advice at all would be appreciated.
Regards.
Here is an idea:
1- Create a 0-1 integer variable c_i for each element. The idea is c_i is zero if element is in the first set, and 1 if it is in the second set. You can accomplish that by saying that 0 <= c_i and c_i <= 1.
2- The sum of the elements in the first set can be written as 1*(1 - c_1) + 3*(1 - c_2) + ... +
3- The sum of the elements in the second set can be written as 1*c1 + 3*c2 + ...
While SMT-Lib2 is quite expressive, it's not the easiest language to program in. Unless you have a hard requirement that you have to code directly in SMTLib2, I'd recommend looking into other languages that have higher-level bindings to SMT solvers. For instance, both Haskell and Scala have libraries that allow you to script SMT solvers at a much higher level. Here's how to solve your problem using the Haskell, for instance: https://gist.github.com/1701881.
The idea is that these libraries allow you to code at a much higher level, and then perform the necessary translation and querying of the SMT solver for you behind the scenes. (If you really need to get your hands onto the SMTLib encoding of your problem, you can use these libraries as well, as they typically come with the necessary API to dump the SMTLib they generate before querying the solver.)
While these libraries may not offer everything that Z3 gives you access to via SMTLib, they are much easier to use for most practical problems of interest.

Constrained Sequence to Index Mapping

I'm puzzling over how to map a set of sequences to consecutive integers.
All the sequences follow this rule:
A_0 = 1
A_n >= 1
A_n <= max(A_0 .. A_n-1) + 1
I'm looking for a solution that will be able to, given such a sequence, compute a integer for doing a lookup into a table and given an index into the table, generate the sequence.
Example: for length 3, there are 5 the valid sequences. A fast function for doing the following map (preferably in both direction) would be a good solution
1,1,1 0
1,1,2 1
1,2,1 2
1,2,2 3
1,2,3 4
The point of the exercise is to get a packed table with a 1-1 mapping between valid sequences and cells.
The size of the set in bounded only by the number of unique sequences possible.
I don't know now what the length of the sequence will be but it will be a small, <12, constant known in advance.
I'll get to this sooner or later, but though I'd throw it out for the community to have "fun" with in the meantime.
these are different valid sequences
1,1,2,3,2,1,4
1,1,2,3,1,2,4
1,2,3,4,5,6,7
1,1,1,1,2,3,2
these are not
1,2,2,4
2,
1,1,2,3,5
Related to this
There is a natural sequence indexing, but no so easy to calculate.
Let look for A_n for n>0, since A_0 = 1.
Indexing is done in 2 steps.
Part 1:
Group sequences by places where A_n = max(A_0 .. A_n-1) + 1. Call these places steps.
On steps are consecutive numbers (2,3,4,5,...).
On non-step places we can put numbers from 1 to number of steps with index less than k.
Each group can be represent as binary string where 1 is step and 0 non-step. E.g. 001001010 means group with 112aa3b4c, a<=2, b<=3, c<=4. Because, groups are indexed with binary number there is natural indexing of groups. From 0 to 2^length - 1. Lets call value of group binary representation group order.
Part 2:
Index sequences inside a group. Since groups define step positions, only numbers on non-step positions are variable, and they are variable in defined ranges. With that it is easy to index sequence of given group inside that group, with lexicographical order of variable places.
It is easy to calculate number of sequences in one group. It is number of form 1^i_1 * 2^i_2 * 3^i_3 * ....
Combining:
This gives a 2 part key: <Steps, Group> this then needs to be mapped to the integers. To do that we have to find how many sequences are in groups that have order less than some value. For that, lets first find how many sequences are in groups of given length. That can be computed passing through all groups and summing number of sequences or similar with recurrence. Let T(l, n) be number of sequences of length l (A_0 is omitted ) where maximal value of first element can be n+1. Than holds:
T(l,n) = n*T(l-1,n) + T(l-1,n+1)
T(1,n) = n
Because l + n <= sequence length + 1 there are ~sequence_length^2/2 T(l,n) values, which can be easily calculated.
Next is to calculate number of sequences in groups of order less or equal than given value. That can be done with summing of T(l,n) values. E.g. number of sequences in groups with order <= 1001010 binary, is equal to
T(7,1) + # for 1000000
2^2 * T(4,2) + # for 001000
2^2 * 3 * T(2,3) # for 010
Optimizations:
This will give a mapping but the direct implementation for combining the key parts is >O(1) at best. On the other hand, the Steps portion of the key is small and by computing the range of Groups for each Steps value, a lookup table can reduce this to O(1).
I'm not 100% sure about upper formula, but it should be something like it.
With these remarks and recurrence it is possible to make functions sequence -> index and index -> sequence. But not so trivial :-)
I think hash with out sorting should be the thing.
As A0 always start with 0, may be I think we can think of the sequence as an number with base 12 and use its base 10 as the key for look up. ( Still not sure about this).
This is a python function which can do the job for you assuming you got these values stored in a file and you pass the lines to the function
def valid_lines(lines):
for line in lines:
line = line.split(",")
if line[0] == 1 and line[-1] and line[-1] <= max(line)+1:
yield line
lines = (line for line in open('/tmp/numbers.txt'))
for valid_line in valid_lines(lines):
print valid_line
Given the sequence, I would sort it, then use the hash of the sorted sequence as the index of the table.

Resources