How do I use COMPUTE to create and calculate a series of values in SPSS? - spss

I need to compute 61 new variables using a simple math equation based on 4 sets of 61 existing variables. I know I can write 61 compute statements. Is there a more elegant way of creating these variables? Here's how the 61 statements would look:
COMPUTE score_1 = factor_1 * (a_1 + b_1) + c_1.
...
COMPUTE score_61 = factor_61 * (a_61 + b_61) + c_61.
EXECUTE.
Thanks in advance.
recode accept to and numbers my new variables (recode raw1 to raw61 (1=0) (2=1) into a_1 to a_61.) Can I do the same here?

You can use a do repeat structure
DO REPEAT score=score_1 score_2 ... score_61
/factor = factor_1 factor_2 ... factor_61
/a=a_1 a_2 ... a_61
/b=b_1 b_2 ... b_61
/c=c_1 c_2 ... c_61.
COMPUTE score=factor*(a+b)+c.
END REPEAT.
EXECUTE.
In the fortunate event that your variables are in set order (i.e. - all factors are consecutive, all a are consecutive, etc. you may reference them usingto like this:
/factor = factor_1 TO factor_61
otherwise, you need to enumerate them one by one. Hope this helps

Related

How to combine several input data from items to one (string) variable in ztree?

In my experiment, users have the choice between 10 items per round they can select or leave the checkbox emtpy. In the next step, I'd like to create a new variable, e.g. MyInputR1, which holds the values of the previous checkboxes in the right order and as 1 new number.
My approach so far:
a)
Formatting input data: f1=format(D11,0.2).
Combining the input data and storing the information in a new variable: f = f1 + f2 + f3 + ....
Creating the variable MyInputR1 = stringtonumber(f)
b) Combining the input data (with values 0 or 1): MyInputR1 = D11 + D12 + D13 + D14 + ...
Unfortunately, the logic does not sum up and ztree does not understand what I am trying to do.
Thus my question:
Is it possible to combine / string together input data into 1 new variable, instead of adding it up?
Input data: checkbox with values 0 or 1
in total 10 input variables (D11 - D110)
Looking for a variable that e.g. looks like this: MyInputR1 = 0000011111
ztree code
Thanks for your help!

Display polynomials in reverse order in SageMath

So I would like to print polynomials in one variable (s) with one parameter (a), say
a·s^3 − s^2 - a^2·s − a + 1.
Sage always displays it with decreasing degree, and I would like to get something like
1 - a - a^2·s - s^2 + a·s^3
to export it to LaTeX. I can't figure out how to do this... Thanks in advance.
As an alternative to string manipulation, one can use the series expansion.
F = a*s^3 - s^2 - a^2*s - a + 1
F.series(s, F.degree(s)+1)
returns
(-a + 1) + (-a^2)*s + (-1)*s^2 + (a)*s^3
which appears to be what you wanted, save for some redundant parentheses.
This works because (a) a power series is ordered from lowest to highest coefficients; (b) making the order of remainder greater than the degree of the polynomial ensures that the series is just the polynomial itself.
This is not easy, because the sort order is defined in Pynac, a fork of Ginac, which Sage uses for its basic symbolic manipulation. However, depending on what you need, it is possible programmatically:
sage: F = 1 + x + x^2
sage: "+".join(map(str,sorted([f for f in F.operands()],key=lambda exp:exp.degree(x))))
'1+x+x^2'
I don't know whether this sort of thing is powerful enough for your needs, though. You may have to traverse the "expression tree" quite a bit but at least your sort of example seems to work.
sage: F = a + a^2*x + x^2 - a*x^2
sage: "+".join(map(str,sorted([f for f in F.operands()],key=lambda exp:exp.degree(x))))
'a+a^2*x+-a*x^2+x^2'
Doing this in a short statement requires a number of Python tricks like this, which are very well worth learning if you are going to use Sage (or Numpy, or pandas, or ...) a fair amount.

SPSS: How to create sequential variables

Im very new to SPSS and I am to create variables with names that are similar.
Specifically, i have to create variables:
Visit1_microbe1_test1
Visit1_microbe1_result1
Visit1_microbe1_test2
Visit1_microbe1_result2
...
Visit1_microbe2_test1
Visit1_microbe2_result1
Visit1_microbe2_test2
Visit1_microbe2_result2
...
Visit3_microbe1_test1
Visit3_microbe1_result1
...
Visit3_microbe10_test5
Visit3_microbe10_result5
I can do it manually but it will take a lot of time, please help...
There are various potential commands in SPSS to deal with repetive task such as this.
See for example:
DO REPEAT
VECTOR / LOOP
In this instance SPSS's Macro language is perhaps most apt.
So you may do something like this (This isn't an attempt to answer your exact specific requirement but enough to give you soemthing to work with to adapt to your needs):
DEFINE !CreateNewVars ().
!DO !i = 1 !TO 5
!DO !j = 2 !TO 10
COMPUTE !CONCAT("Q", !i,"_X", !j)=1.
!DOEND
!DOEND
!ENDDEFINE.
!CreateNewVars.

SPSS macro for splitting single numeric variables to multiple variables

I have a variable named A in SPSS database.
A
--
102102
23453212
142378
2367890654
2345
45
I want to split this variable by 2 lengths and create multiple variables as follows.
A_1 A_2 A_3 A_4 A_5
--- --- --- --- ---
10 21 02
23 45 32 12
14 23 78
23 67 89 06 54
23 45
45
Can anyone write SPSS macro to compute this operation?
Using STRING manipulations (after converting the NUMERIC field to STRING, if necessary), specifically SUBSTR you can extract out pairs of digits as you wish.
/* Simulate data */.
data list list / x (f8.0).
begin data.
102102
23453212
142378
2367890654
2345
45
end data.
dataset name dsSim.
If you have a known maximum value, in your example a value of 10 digits long then you'll need 5 variables to store the pairs of digits, which the follow does:
preserve.
set mxwarns 0 /* temporarily supress warning messages */ .
string #xstr (a10).
compute #xstr=ltrim(string(x,f18.0)).
compute A_1=number(substr(#xstr,1,2), f8.0).
compute A_2=number(substr(#xstr,3,2), f8.0).
compute A_3=number(substr(#xstr,5,2), f8.0).
compute A_4=number(substr(#xstr,7,2), f8.0).
compute A_5=number(substr(#xstr,9,2), f8.0).
exe.
restore.
However, you may prefer to code something like this more dynamically (using python) where the code itself would read the maximum value in the data and create as many variables as needed.
begin program.
import spssdata, math
spss.Submit("set mprint on.")
# get maximum value
spss.Submit("""
dataset declare dsAgg.
aggregate outfile=dsAgg /MaxX=max(x).
dataset activate dsAgg.
""")
maxvalue = spssdata.Spssdata().fetchone()[0]
ndigits=math.floor(math.log(maxvalue,10))+1
cmd="""
dataset close dsAgg.
dataset activate dsSim.
preserve.
set mxwarns 0.
string #xstr (a10).
compute #xstr=ltrim(string(x,f18.0)).
"""
for i in range(1,int(math.ceil(ndigits/2))+1):
j=(i-1)*2+1
cmd+="\ncompute B_%(i)s=number(substr(#xstr,%(j)s,2), f8.0)." % locals()
cmd+="\nexe.\nrestore."
spss.Submit(cmd)
spss.Submit("set mprint off.")
end program.
You would need to weigh up the pros on cons of each method to asses which suits you best, for how you anticipate your data to arrive and how you then go onto work with in later. I haven't attempted to wrap either of these up in a macro but that could just as easily be done.

Moving Average across Variables in Stata

I have a panel data set for which I would like to calculate moving averages across years.
Each year is a variable for which there is an observation for each state, and I would like to create a new variable for the average of every three year period.
For example:
P1947=rmean(v1943 v1944 v1945), P1947=rmean(v1944 v1945 v1946)
I figured I should use a foreach loop with the egen command, but I'm not sure about how I should refer to the different variables within the loop.
I'd appreciate any guidance!
This data structure is quite unfit for purpose. Assuming an identifier id you need to reshape, e.g.
reshape long v, i(id) j(year)
tsset id year
Then a moving average is easy. Use tssmooth or just generate, e.g.
gen mave = (L.v + v + F.v)/3
or (better)
gen mave = 0.25 * L.v + 0.5 * v + 0.25 * F.v
More on why your data structure is quite unfit: Not only would calculation of a moving average need a loop (not necessarily involving egen), but you would be creating several new extra variables. Using those in any subsequent analysis would be somewhere between awkward and impossible.
EDIT I'll give a sample loop, while not moving from my stance that it is poor technique. I don't see a reason behind your naming convention whereby P1947 is a mean for 1943-1945; I assume that's just a typo. Let's suppose that we have data for 1913-2012. For means of 3 years, we lose one year at each end.
forval j = 1914/2011 {
local i = `j' - 1
local k = `j' + 1
gen P`j' = (v`i' + v`j' + v`k') / 3
}
That could be written more concisely, at the expense of a flurry of macros within macros. Using unequal weights is easy, as above. The only reason to use egen is that it doesn't give up if there are missings, which the above will do.
FURTHER EDIT
As a matter of completeness, note that it is easy to handle missings without resorting to egen.
The numerator
(v`i' + v`j' + v`k')
generalises to
(cond(missing(v`i'), 0, v`i') + cond(missing(v`j'), 0, v`j') + cond(missing(v`k'), 0, v`k')
and the denominator
3
generalises to
!missing(v`i') + !missing(v`j') + !missing(v`k')
If all values are missing, this reduces to 0/0, or missing. Otherwise, if any value is missing, we add 0 to the numerator and 0 to the denominator, which is the same as ignoring it. Naturally the code is tolerable as above for averages of 3 years, but either for that case or for averaging over more years, we would replace the lines above by a loop, which is what egen does.
There is a user written program that can do that very easily for you. It is called mvsumm and can be found through findit mvsumm
xtset id time
mvsumm observations, stat(mean) win(t) gen(new_variable) end

Resources