SPSS automated variable labels with leading zeros - spss

what I am struggling with is to adopt one of the syntax/macros from http://www.spsstools.net/.
It was intended to change labels of "many-many" variables that do not have the leading zeros, but my variables do have those:
DATA LIST LIST /id.
BEGIN DATA
1
END DATA.
NUMERIC set01sub1 TO set01sub4.
* but the intended variable names are set01sub01 TO set01sub04 (with leading zeros and going over 10).
SET MPRINT=yes.
DEFINE !label (lab=!TOKENS(1) /stem=!TOKENS(1) /nb1=!TOKENS(1) /nb2=!TOKENS(1))
!DO !cnt=!nb1 !TO !nb2
!LET !var=!CONCAT(!stem,!cnt)
!LET !labe=!QUOTE(!CONCAT(!UNQUOTE(!lab),!cnt))
VARIABLE LABEL !var !labe.
!DOEND.
!ENDDEFINE.
!label lab='Set 1, subset ' stem=set01sub nb1=1 nb2=4.
I was very naive and I have tried to use !STRING(...,N2):
!LET !labe=!QUOTE(!CONCAT(!UNQUOTE(!lab),!STRING(!cnt,N2)))
but, this didn't work as expected
my variables are
subID
rvnAns_s01m01 TO rvnAns_s01m12
rvnAns_s02m01 TO rvnAns_s02m36
rvnAns_s03m01 TO rvnAns_s03m36
rvnEva_s01m01 TO rvnEva_s01m12
rvnEva_s02m01 TO rvnEva_s02m36
rvnEva_s03m01 TO rvnEva_s03m36
and the intended labels are:
"Subject ID"
"RAPM, Series 01, Matrix 01 answer"
"RAPM, Series 01, Matrix 02 answer"
...
"RAPM, Series 01, Matrix 12 answer"
"RAPM, Series 02, Matrix 01 answer"
"RAPM, Series 02, Matrix 02 answer"
...
"RAPM, Series 02, Matrix 36 answer"
"RAPM, Series 03, Matrix 01 answer"
"RAPM, Series 03, Matrix 02 answer"
...
"RAPM, Series 03, Matrix 36 answer"
and
"RAPM, Series 01, Matrix 01 answer evaluation"
"RAPM, Series 01, Matrix 02 answer evaluation"
...
"RAPM, Series 01, Matrix 12 answer evaluation"
"RAPM, Series 02, Matrix 01 answer evaluation"
"RAPM, Series 02, Matrix 02 answer evaluation"
...
"RAPM, Series 02, Matrix 36 answer evaluation"
"RAPM, Series 03, Matrix 01 answer evaluation"
"RAPM, Series 03, Matrix 02 answer evaluation"
...
"RAPMs, Series 03, Matrix 36 answer evaluation"
I would be very grateful for any help or suggestions on how to achieve such result.

If you install the Python Essentials via the SPSS Community website (www.ibm.com/developerworks/spssdevcentral), the following program will convert the variable names.
It makes two assumptions:
1) None of the names has a form with just a leading zero, e.g., x0y1. (That could be addressed with a little more complexity
2) None of the renames will result in a name collision.
3) None of the expanded names will exceed the maximum length for a name (64 bytes).
Explanation below the program.
begin program.
import spss, re
for v in range(spss.GetVariableCount()):
vname = spss.GetVariableName(v)
vnamenew = re.sub(r"(\D)([1-9])", r"\g<1>0\g<2>", vname)
if vname != vnamenew:
spss.Submit("rename variables (%s=%s)" % (vname, vnamenew))
print vname, "->", vnamenew
end program.
This program iterates through all the variable names. For each one it looks for all occurrences of nondigit-nonzerodit and replaces it with nondigit-0-digit and then generates and runs a rename variables command.

As I said in my comment on crossvalidated, your code does work for the given sample if you supply the stem token with the leading zero, e.g. stem=set01sub0.
If you have ranges that span more than 10 digits though, I presume you won't have a leading zero for values of 10+. Below I have an example in the MACRO using conditional evaluation to concatenate a leading zero for values below 10. If you have potentially more values (e.g. go into 100's and so have two leading zeroes) this would need to be amended.
DATA LIST LIST /id.
BEGIN DATA
1
END DATA.
NUMERIC set01sub01 TO set01sub15.
DEFINE !label (lab=!TOKENS(1) /stem=!TOKENS(1) /nb1=!TOKENS(1) /nb2=!TOKENS(1))
!DO !cnt=!nb1 !TO !nb2
!IF (!LENGTH(!cnt) = 1) !THEN
!LET !cnt0 = !CONCAT("0",!cnt)
!ELSE
!LET !cnt0 = !cnt
!IFEND
!LET !var=!CONCAT(!stem,!cnt0)
!LET !labe=!QUOTE(!CONCAT(!UNQUOTE(!lab),!cnt0))
VARIABLE LABEL !var !labe.
!DOEND.
!ENDDEFINE.
PRESERVE.
SET MPRINT ON.
!label lab='Set 1, subset ' stem=set01sub nb1=1 nb2=15.
RESTORE.

Related

Savitzky - Golay filter for 2D Matrices

i am doing some research about implementing a Savitzky-Golay filter for images. As far as i have read, the main application for this filter is signal processing, e.g. for smoothing audio-files.
The idea is fitting a polynomial through a defined neighbourhood around point P(i) and setting this point P to his new value P_new(i) = polynomial(i).
The problem in 2D-space is - in my opinion - that there is not only one direction to do the fitting. You can use different "directions" to find a polynomial. Like for
[51 52 11 33 34]
[41 42 12 24 01]
[01 02 PP 03 04]
[21 23 13 43 44]
[31 32 14 53 54]
It could be:
[01 02 PP 03 04], (horizontal)
[11 12 PP 23 24], (vertical)
[51 42 PP 43 54], (diagonal)
[41 42 PP 43 44], (semi-diagonal?)
but also
[41 02 PP 03 44], (semi-diagonal as well)
(see my illustration)
So my question is: Does the Savitzky-Golay filter even make sense for 2D-space, and if yes, is there and any defined generalized form for this filter for higher dimensions and larger filter masks?
Thank you !
A first option is to use SG filtering in a separable way, i.e. filtering once on the horizontal rows, then a second time on the vertical rows.
A second option is to rewrite the equations with a bivariate polynomial (bicubic f.i.) and solve for the coefficients by least-squares.

What is the significance of the code at right of variable declaration in COBOL?

I was studying COBOL code and I did not understand the number at the right of the code line:
007900 03 EXAMPLE-NAME PIC S9(17) COMP-3. EB813597
the first number is about position of that line in code, the second is about column's position (like how many 'tabs' you are using), the third is type of variable, but the fourth (COMP-3) and mainly the last (EB813597) I did not understand.
What does it mean?
Columns >= 72 are ignored. So EB813597 is ignored. It could be a change id from the last time it was changed or have some site specific meaning e.g. EB could be the initials of the person who last changed it.
Comp-3 - is the type of numeric. It is bit like using int or double in C/java. In Comp-3 (packed-decimal) 123 is stored as x'123c'. Alternatives to comp-3 include comp - normally big endian binary integer, comp-5 (like int / long in C)
007900 03 EXAMPLE-NAME PIC S9(17) COMP-3. EB813597
(a) (b) Field-Name (c) (d) Usage (numeric type)
a - line-number ignored by the compiler
b - level-number it provides a method of grouping fields together
01 Group.
03 Field-1 ...
03 Field-2 ...
field-1 and field-2 belong to group. it is a bit like struct in c
struct {
int field_1;
int field-2;
...
}
c) PIC (picture) tells us the field picture follows.
d) fields picture in this case it is a signed field with 17 decimal digits
Comp-3 - usage - how the field stored
So in summary EXAMPLE-NAME is a Signed numeric field with 17 decimal digits and it is stored as Comp-3 (packed decimal).

VARSTOCASES (in SPSS) function with unequal spacing of waves

My problem with the VARSTOCASES is that I'm unable to deal with unequal spacing of waves in longitudinal data (I'm using the NLSY79). My dependent variable (log of wage) is not available for all years. But with R, I can easily deal with that using a syntax like this :
ld = reshape(d, varying = c("logwage1989", "logwage1990", "logwage1991", "logwage1992", "logwage1993", "logwage1994", "logwage1996", "logwage1998", "logwage2000", "logwage2002", "logwage2004", "logwage2006", "logwage2008", "logwage2010"), v.names = "logwage", timevar = "year", times = c("1989", "1990", "1991", "1992", "1993", "1994", "1996", "1998", "2000", "2002", "2004", "2006", "2008", "2010"), direction = "long")
And in SPSS, what I do is something like this :
VARSTOCASES
/make logwage from logwage1989 logwage1990 logwage1991 logwage1992 logwage1993 logwage1994 logwage1996 logwage1998 logwage2000 logwage2002 logwage2004 logwage2006 logwage2008 logwage2010
/index= year(14)
/keep=grade AFQT educmom educdad occupationmom occupationdad familyincome.
In the above, 14 is the total number of waves. And what SPSS outputs is a series of numbers going from 1 to 14. The data is collected once every year first, and then it's collected once every two years. For SPSS, the values 1 and 2 in the year variable correspond to 1989 and 1990 while values 13 and 14 correspond to 2008 and 2010, respectively. And that's the problem.
How would you write the reshape function in SPSS as I did in R ?
On the VARSTOCASES command instead of using a numeric index you can use a string index, which will put the original variable names into the column. This can then be converted to a numeric column of the years.
DATA LIST FREE /logwage1989 logwage1990 logwage1991 logwage1992 logwage1993 logwage1994 logwage1996 logwage1998
logwage2000 logwage2002 logwage2004 logwage2006 logwage2008 logwage2010.
BEGIN DATA.
89 90 91 92 93 94 96 98 00 02 04 06 08 10
END DATA.
VARSTOCASES
/MAKE logwage FROM logwage1989 TO logwage2010
/INDEX=year (logwage).
*Now convert to an actual year.
COMPUTE year = REPLACE(year,"logwage","").
ALTER TYPE year (F4.0).

Why K-map has states in sequence of 00,01,11,10 instead of 00,01,10,11?

Why K-map has states in sequence of 00,01,11,10 instead of 00,01,10,11?
It's because in the first sequence, each entry differs in only one bit whereas in the second sequence the transition from 01 to 10 changes two bits which produces a race condition. In asynchronous logic, nothing ever happens at the same time, so 01 to 10 is either 01 00 10 or 01 11 10 and that causes problems.
In the process of simplification, when 2 minterms, with one bit differing,r ORed, one variable gets eliminated as 1 + 0 = 1
This is because if we write 00 01 11 10 then in between two there is a difference of two bits and as smparkes told that asynchronous cannot take two values a time so that is the only way left now. As we take gray code in a similar way gray code of 00 is 00, of 01 is 01, of 11 is 10 and of 10 is 11. In this way k map is numbered.

Any date should be converted to end of the month date in cobol?

I have a requirement where any date (DD.MM.YYYY) should be converted to last date of month (ex: If date is 20.01.1999 then it should convert into 31.01.1999) ?
Exactly what are you having trouble with? COBOL or the algorithm? I'm guessing its COBOL.
I'm not going to give you a direct answer because you are
obviously leaning the language and there is value in working out the specific details
for yourself.
Here are a couple of hints:
Define a date field in WORKING-STORAGE so that you can pick out the day, month and year as separate items. Something like:
01 TEST-DATE.
05 TEST-DAY PIC 99.
05 PIC X.
05 TEST-MONTH PIC 99.
05 PIC X.
05 TEST-YEAR PIC 9999.
Note the unnamed PIC X fields. These contain the day/month/year delimiters. They do not need to be given data names because
you do not need to reference them. Sometimes this type of data item is given
the name FILLER, but the name is optional.
Read up on the EVALUATE statement. Here is a link to
the IBM Enterprise COBOL manual. This description of EVALUATE should be similar in all versions of COBOL.
MOVE the date of interest TO TEST-DATE. Now you can reference the year, month and day as individual items: TEST-DAY, TEST-MONTH and TEST-YEAR.
Use EVALUATE to test the month (TEST-MONTH). If the month is a 30 day month then MOVE 30 to TEST-DAY. Do the same for
31 day months. February is a special case because of leap years. Once you have determined that the month is February,
test TEST-YEAR to determine if it is a leap year
and MOVE 28 or 29 TO TEST-DAY depending on the outcome of the test.
Now TEST-DATE will contain the date you are looking for. MOVE it to wherever it is needed.
You can use function integer-of-date which gives returns an integral value corresponding to any date. Assuming your input date is in ddmmyyyy format and you expect hte output in the same format. Lets say date is 20011999 and you want as 31011999. You can follow the below steps.
Increase the month of the input date by one. (20*02*1999)
Make the day as 01 and use function integer-of-date (*01*021999)
subtract one from the integer returned.
use function date-of-integer which will give you the required result.
Note here you will have to add one more check for handling December month.
Here you go! Run the code here
IDENTIFICATION DIVISION.
PROGRAM-ID. STACK2.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-DATE PIC X(10).
01 WORK-DATE.
05 WORK-DAY PIC 9(2).
05 PIC X.
05 WORK-MONTH PIC 9(2).
05 PIC X.
05 WORK-YEAR PIC 9(4).
01 MONTH-31 PIC 9(2).
88 IS-MONTH-31 VALUES 01, 03, 05, 07, 08, 10, 12.
88 IS-MONTH-30 VALUES 04, 06, 09, 11.
01 WS-C PIC 9(4) VALUE 0.
01 WS-D PIC 9(4) VALUE 0.
PROCEDURE DIVISION.
ACCEPT WS-DATE.
MOVE WS-DATE TO WORK-DATE.
DISPLAY 'ACTUALE TEST-DATE: ' WORK-DATE.
MOVE WORK-MONTH TO MONTH-31.
EVALUATE TRUE
WHEN IS-MONTH-31
MOVE 31 TO WORK-DAY
WHEN IS-MONTH-30
MOVE 30 TO WORK-DAY
WHEN OTHER
DIVIDE WORK-YEAR BY 4 GIVING WS-C REMAINDER WS-D
IF WS-D NOT EQUAL 0
MOVE 28 TO WORK-DAY
ELSE
MOVE 29 TO WORK-DAY
END-IF
END-EVALUATE.
DISPLAY 'MODIFIED TEST-DATE: ' WORK-DATE
STOP RUN.
This solution goes hand in hand with #NealB's answer.
The procedure "compute-month-end-date" does not require any checks for leap year or December.
identification division.
program-id. last-day.
data division.
working-storage section.
1 test-date.
88 test-1 value "20.01.1999".
88 test-2 value "20.02.2004".
88 test-3 value "20.12.2005".
2 dd pic 99.
2 pic x.
2 mm pic 99.
2 pic x.
2 yyyy pic 9999.
1 month-end-date binary pic 9(8) value 0.
procedure division.
begin.
set test-1 to true
perform run-test
set test-2 to true
perform run-test
set test-3 to true
perform run-test
stop run
.
run-test.
display test-date " to " with no advancing
perform test-date-to-iso
perform compute-month-end-date
perform iso-to-test-date
display test-date
.
compute-month-end-date.
*> get date in following month
compute month-end-date = function
integer-of-date (month-end-date) + 32
- function mod (month-end-date 100)
compute month-end-date = function
date-of-integer (month-end-date)
*> get last day of target month
compute month-end-date = function
integer-of-date (month-end-date)
- function mod (month-end-date 100)
compute month-end-date = function
date-of-integer (month-end-date)
.
test-date-to-iso.
compute month-end-date = yyyy * 10000
+ mm * 100 + dd
.
iso-to-test-date.
move month-end-date to dd
.
end program last-day.
Results:
20.01.1999 to 31.01.1999
20.02.2004 to 29.02.2004
20.12.2005 to 31.12.2005
While this might be a bit daunting to the novice COBOL programmer, there is a simple explanation. The procedure "compute-month-end-date" consists of two identical parts with the exception of the "+32".
Taking the second part first, it subtracts the day of month from the integer of a date giving the integer value for the 'zeroth' day of the month. This is precisely the integer value for the last day of the prior month. The following compute gives the date in 'yyyymmdd' format.
The first part does the same, except that it adds 32 to get a date in the following month, the 1st through the 4th, depending on the number of days in the original month.
Taken togther 19990120 is first changed to 19990201, then changed to 19990131. And 20040220 to 20040303, then 20040229. 20051220 to 20060101, then 20051231.

Resources