How to refactor string containing variable names into booleans? - spss

I have an SPSS variable containing lines like:
|2|3|4|5|6|7|8|10|11|12|13|14|15|16|18|20|21|22|23|24|25|26|27|28|29|
Every line starts with pipe, and ends with one. I need to refactor it into boolean variables as the following:
var var1 var2 var3 var4 var5
|2|4|5| 0 1 0 1 1
I have tried to do it with a loop like:
loop # = 1 to 72.
compute var# = SUBSTR(var,2#,1).
end loop.
exe.
My code won't work with 2 or more digits long numbers and also it won't place the values into their respective variables, so I've tried nest the char.substr(var,char.rindex(var,'|') + 1) into another loop with no luck because it still won't allow me to recognize the variable number.
How can I do it?

This looks like a nice job for the DO REPEAT command. However the type conversion is somewhat tricky:
DO REPEAT var#i=var1 TO var72
/i=1 TO 72.
COMPUTE var#i = CHAR.INDEX(var,CONCAT("|",LTRIM(STRING(i,F2.0)),"|"))>0).
END REPEAT.
Explanation: Let's go from the inside to the outside:
STRING(value,F2.0) converts the numeric values into a string of two digits (with a leading white space where the number consist of just one digit), e.g. 2 -> " 2".
LTRIM() removes the leading whitespaces, e.g. " 2" -> "2".
CONCAT() concatenates strings. In the above code it adds the "|" before and after the number, e.g. "2" -> "|2|"
CHAR.INDEX(stringvar,searchstring) returns the position at which the searchstring was found. It returns 0 if the searchstring wasn't found.
CHAR.INDEX(stringvar,searchstring)>0 returns a boolean value indicating if the searchstring was found or not.

It's easier to do the manipulations in Python than native SPSS syntax.
You can use SPSSINC TRANS extension for this purpose.
/* Example data*/.
data list free / TextStr (a99).
begin data.
"|2|3|4|5|6|7|8|10|11|12|13|14|15|16|18|20|21|22|23|24|25|26|27|28|29|"
end data.
/* defining function to achieve task */.
begin program.
def runTask(x):
numbers=map(int,filter(None,[i.strip() for i in x.lstrip('|').split("|")]))
answer=[1 if i in numbers else 0 for i in xrange(1,max(numbers)+1)]
return answer
end program.
/* Run job*/.
spssinc trans result = V1 to V30 type=0 /formula "runTask(TextStr)".
exe.

Related

How to create a tables with variable length with string-like keys in lua

I have a file database. Inside that file I have something like:
DB_A = ...
DB_B = ...
.
.
.
DB_N = ...
I would like to parse the data and group them in lua code like this:
data={}
-- the result after parsing a file
data={
["DB_A"] = {...},
["DB_B"] = {...},
.
.
.
["DB_N"] = {...}
}
In other words, is it possible to create a table inside a table dynamically and assign the key to each table without previously knowing what will be the names of the key (that is something I can figure out after parsing the data from a database).
(Just as a note, I am using Lua 5.3.5; also, I apologize that my code resembles C more than Lua!)
Iterating through your input file line-by-line--which can be done with the Lua FILE*'s lines method--you can use string.match to grab the information you are looking for from each line.
#!/usr/bin/lua
local PATTERN = "(%S+)%s?=%s?(%S+)"
local function eprintf(fmt, ...)
io.stderr:write(string.format(fmt, ...))
return
end
local function printf(fmt, ...)
io.stdout:write(string.format(fmt, ...))
return
end
local function make_table_from_file(filename)
local input = assert(io.open(filename, "r"))
local data = {}
for line in input:lines() do
local key, value = string.match(line, PATTERN)
data[key] = value
end
return data
end
local function main(argc, argv)
if (argc < 1) then
eprintf("Filename expected from command line\n")
os.exit(1)
end
local data = make_table_from_file(argv[1])
for k, v in pairs(data) do
printf("data[%s] = %s\n", k, data[k])
end
return 0
end
main(#arg, arg)
The variable declared at the top of the file, PATTERN, is your capture pattern to be used by string.match. If you are unfamiliar with how Lua's pattern matching works, this pattern looks for a series of non-space characters with zero or one spaces to its right, an equal sign, another space, and then another series of non-space characters. The two series of non-space characters are the two matches--key and value--returned by string.match in the function make_table_from_file.
The functions eprintf and printf are my Lua versions of C-style formatted output functions. The former writes to standard error, io.stderr in Lua; and the latter writes to standard output, io.stdout in Lua.
In your question, you give a sample of what your expected output is. Within your table data, you want it to contain keys that correspond to tables as values. Based on the sample input text you provided, I assume the data contained within these tables are whatever comes to the right of the equal signs in the input file--which you represent with .... As I do not know what exactly those ...s represent, I cannot give you a solid example for how to separate that right-hand data into a table. Depending on what you are looking to do, you could take the second variable returned by string.match, which I called value, and further separate it using Lua's string pattern matching. It could look something like this:
...
local function make_table_from_value(val)
// Split `val` into distinct elements to form a table with `some_pattern`
return {string.match(val, some_pattern)}
end
local function make_table_from_file(filename)
local input = assert(io.open(filename, "r"))
local data = {}
for line in input:lines() do
local key, value = string.match(line, PATTERN)
data[key] = make_table_from_value(value)
end
return data
end
...
In make_table_from_value, string.match will return some number of elements, based on whatever string pattern you provide as its second argument, which you can then use to create a table by enclosing the function call in curly braces. It will be a table that uses numerical indices as keys--rather than strings or some other data type--starting from 1.

Reading a column file of x y z into table in Lua

Been trying to find my way through Lua, so I have a file containing N lines of numbers, 3 per line, it is actually x,y,z coordinates. I could make it a CSV file and use some Lua CSV parser, but I guess it's better if I learn how to do this regardless.
So what would be the best way to deal with this? So far I am able to read each line into a table line by the code snippet below, but 1) I don't know if this is a string or number table, 2) if I print tbllinesx[1], it prints the whole line of three numbers. I would like to be able to have tbllines[1][1], tbllines[1][2] and tbllines[1][3] corresponding to the first 3 number of 1st line of my file.
local file = io.open("locations.txt")
local tbllinesx = {}
local i = 0
if file then
for line in file:lines() do
i = i + 1
tbllinesx[i] = line
end
file:close()
else
error('file not found')
end
From Programming in Lua https://www.lua.org/pil/21.1.html
You can call read with multiple options; for each argument, the
function will return the respective result. Suppose you have a file
with three numbers per line:
6.0 -3.23 15e12
4.3 234 1000001
... Now you want to print the maximum of each line. You can read all three numbers in a single call to read:
while true do
local n1, n2, n3 = io.read("*number", "*number", "*number")
if not n1 then break end
print(math.max(n1, n2, n3))
end
In any case, you should always consider the alternative of reading the
whole file with option "*all" from io.read and then using
gfind to break it up:
local pat = "(%S+)%s+(%S+)%s+(%S+)%s+"
for n1, n2, n3 in string.gfind(io.read("*all"), pat) do
print(math.max(n1, n2, n3))
end
I'm sure you can figure out how to modify this to put the numbers into table fields on your own.
If you're using three captures you can just use table.pack to create your line table with three entries.
Assuming you only have valid lines in your data file (locations.txt) all you need is change the line:
tbllinesx[i] = line
to:
tbllinesx[i] = { line:match '(%d+)%s+(%d+)%s+(%d+)' }
This will put each of the three space-delimited numbers into its own spot in a table for each line separately.
Edit: The repeated %d+ part of the pattern will need to be adjusted according to your actual input. %d+ assumes plain integers, you need something more involved for possible minus sign (%-?%d+) and for possible dot (%-?%d-%.?%d+), and so on. Of course the easy way would be to grab everything that is not space (%S+) as a potential number.

string comparison against factors in Stata

Suppose I have a factor variable with labels "a" "b" and "c" and want to see which observations have a label of "b". Stata refuses to parse
gen isb = myfactor == "b"
Sure, there is literally a "type mismatch", since my factor is encoded as an integer and so cannot be compared to the string "b". However, it wouldn't kill Stata to (i) perform the obvious parse or (ii) provide a translator function so I can write the comparison as label(myfactor) == "b". Using decode to (re)create a string variable defeats the purpose of encoding, which is to save space and make computations more efficient, right?
I hadn't really expected the comparison above to work, but I at least figured there would be a one- or two-line approach. Here is what I have found so far. There is a nice macro ("extended") function that maps the other way (from an integer to a label, seen below as local labi: label ...). Here's the solution using it:
// sample data
clear
input str5 mystr int mynum
a 5
b 5
b 6
c 4
end
encode mystr, gen(myfactor)
// first, how many groups are there?
by myfactor, sort: gen ng = _n == 1
replace ng = sum(ng)
scalar ng = ng[_N]
drop ng
// now, which code corresponds to "b"?
forvalues i = 1/`=ng'{
local labi: label myfactor `i'
if "b" == "`labi'" {
scalar bcode = `i'
break
}
}
di bcode
The second step is what irks me, but I'm sure there's a also faster, more idiomatic way of performing the first step. Can I grab the length of the label vector, for example?
An example:
clear all
set more off
sysuse auto
gen isdom = 1 if foreign == "Domestic":`:value label foreign'
list foreign isdom in 1/60
This creates a variable called isdom and it will equal 1 if foreigns's value label is equal to "Domestic". It uses an extended macro function.
From [U] 18.3.8 Macro expressions:
Also, typing
command that makes reference to `:extended macro function'
is equivalent to
local macroname : extended macro function
command that makes reference to `macroname'
This explains one of the two : in the offered syntax. The other can be explained by
... to specify value labels directly in an expression, rather than through
the underlying numeric value ... You specify the label in double quotes
(""), followed by a colon (:), followed by the name of the value
label.
The quote is from Stata tip 14: Using value labels in expressions, by Kenneth Higbee, The Stata Journal (2004). Freely available at http://www.stata-journal.com/sjpdf.html?articlenum=dm0009
Edit
On computing the number of distinct observations, another way is:
by myfactor, sort: gen ng = _n == 1
count if ng
scalar sc_ng = r(N)
display sc_ng
But yours is fine. In fact, it is documented here: http://www.stata.com/support/faqs/data-management/number-of-distinct-observations/, along with more methods and comments.

Read numbers following a keyword into an array in Fortran 90 from a text file

I have many text files of this format
....
<snip>
'FOP' 0.19 1 24 1 25 7 8 /
'FOP' 0.18 1 24 1 25 9 11 /
/
TURX
560231
300244
70029
200250
645257
800191
900333
600334
770291
300335
220287
110262 /
SUBTRACT
'TURX' 'TURY'/
</snip>
......
where the portions I snipped off contain other various data in various formats. The file format is inconsistent (machine generated), the only thing one is assured of is the keyword TURX which may appear more than once. If it appears alone on one line, then the next few lines will contain numbers that I need to fetch into an array. The last number will have a space then a forward slash (/). I can then use this array in other operations afterwards.
How do I "search" or parse a file of unknown format in fortran, and how do I get a loop to fetch the rest of the data, please? I am really new to this and I HAVE to use fortran. Thanks.
Fortran 95 / 2003 have a lot of string and file handling features that make this easier.
For example, this code fragment to process a file of unknown length:
use iso_fortran_env
character (len=100) :: line
integer :: ReadCode
ReadLoop: do
read (75, '(A)', iostat=ReadCode ) line
if ( ReadCode /= 0 ) then
if ( ReadCode == iostat_end ) then
exit ReadLoop
else
write ( *, '( / "Error reading file: ", I0 )' ) ReadCode
stop
end if
end if
! code to process the line ....
end do ReadLoop
Then the "process the line" code can contain several sections depending on a logical variable "Have_TURX". If Have_TRUX is false you are "seeking" ... test whether the line contains "TURX". You could use a plain "==" if TURX is always at the start of the string, or for more generality you could use the intrinsic function "index" to test whether the string "line" contains TURX.
Once the program is in the mode Have_TRUX is true, then you use "internal I/O" to read the numeric value from the string. Since the integers have varying lengths and are left-justified, the easiest way is to use "list-directed I/O": combining these:
read (line, *) integer_variable
Then you could use the intrinsic function "index" again to test whether the string also contains a slash, in which case you change Have_TRUX to false and end reading mode.
If you need to put the numbers into an array, it might be necessary to read the file twice, or to backspace the file, because you will have to allocate the array, and you can't do that until you know the size of the array. Or you could pop the numbers into a linked list, then when you hit the slash allocate the array and fill it from the linked list. Or if there is a known maximum number of values you could use a temporary array, then transfer the numbers to an allocatable output array. This is assuming that you want the output argument of the subroutine be an allocatable array of the correct length, and the it returns one group of numbers per call:
integer, dimension (:), allocatable, intent (out) :: numbers
allocate (numbers (1: HowMany) )
P.S. There is a brief summary of the language features at http://en.wikipedia.org/wiki/Fortran_95_language_features and the gfortran manual has a summary of the intrinsic procedures, from which you can see what built in functions are available for string handling.
I'll give you a nudge in the right direction so that you can finish your project.
Some basics:
Do/While as you'll need some sort of loop
structure to loop through the file
and then over the numbers. There's
no for loop in Fortran, so use this
type.
Read
to read the strings.
To start you need something like this:
program readlines
implicit none
character (len=30) :: rdline
integer,dimension(1000) :: array
! This sets up a character array with 30 positions and an integer array with 1000
!
open(18,file='fileread.txt')
do
read(18,*) rdline
if (trim(rdline).eq.'TURX') exit !loop until the trimmed off portion matches TURX
end do
See this thread for way to turn your strings into integers.
Final edit: Looks like MSB has got most of what I just found out. The iostat argument of the read is the key to it. See this site for a sample program.
Here was my final way around it.
PROGRAM fetchnumbers
implicit none
character (len=50) ::line, numdata
logical ::is_numeric
integer ::I,iost,iost2,counter=0,number
integer, parameter :: long = selected_int_kind(10)
integer, dimension(1000)::numbers !Can the number of numbers be up to 1000?
open(20,file='inputfile.txt') !assuming file is in the same location as program
ReadLoop: do
read(20,*,iostat=iost) line !read data line by line
if (iost .LT. 0) exit !end of file reached before TURX was found
if (len_trim(line)==0) cycle ReadLoop !ignore empty lines
if (index(line, 'TURX').EQ.1) then !prepare to begin capturing
GetNumbers: do
read(20, *,iostat=iost2)numdata !read in the numbers one by one
if (.NOT.is_numeric(numdata)) exit !no more numbers to read
if (iost2 .LT. 0) exit !end of file reached while fetching numbers
read (numdata,*) number !read string value into a number
counter = counter + 1
Storeloop: do I =1,counter
if (I<counter) cycle StoreLoop
numbers(counter)=number !storing data into array
end do StoreLoop
end do GetNumbers
end if
end do ReadLoop
write(*,*) "Numbers are:"
do I=1,counter
write(*,'(I14)') numbers(I)
end do
END PROGRAM fetchnumbers
FUNCTION is_numeric(string)
IMPLICIT NONE
CHARACTER(len=*), INTENT(IN) :: string
LOGICAL :: is_numeric
REAL :: x
INTEGER :: e
is_numeric = .FALSE.
READ(string,*,IOSTAT=e) x
IF (e == 0) is_numeric = .TRUE.
END FUNCTION is_numeric

Lua base converter

I need a base converter function for Lua. I need to convert from base 10 to base 2,3,4,5,6,7,8,9,10,11...36 how can i to this?
In the string to number direction, the function tonumber() takes an optional second argument that specifies the base to use, which may range from 2 to 36 with the obvious meaning for digits in bases greater than 10.
In the number to string direction, this can be done slightly more efficiently than Nikolaus's answer by something like this:
local floor,insert = math.floor, table.insert
function basen(n,b)
n = floor(n)
if not b or b == 10 then return tostring(n) end
local digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
local t = {}
local sign = ""
if n < 0 then
sign = "-"
n = -n
end
repeat
local d = (n % b) + 1
n = floor(n / b)
insert(t, 1, digits:sub(d,d))
until n == 0
return sign .. table.concat(t,"")
end
This creates fewer garbage strings to collect by using table.concat() instead of repeated calls to the string concatenation operator ... Although it makes little practical difference for strings this small, this idiom should be learned because otherwise building a buffer in a loop with the concatenation operator will actually tend to O(n2) performance while table.concat() has been designed to do substantially better.
There is an unanswered question as to whether it is more efficient to push the digits on a stack in the table t with calls to table.insert(t,1,digit), or to append them to the end with t[#t+1]=digit, followed by a call to string.reverse() to put the digits in the right order. I'll leave the benchmarking to the student. Note that although the code I pasted here does run and appears to get correct answers, there may other opportunities to tune it further.
For example, the common case of base 10 is culled off and handled with the built in tostring() function. But similar culls can be done for bases 8 and 16 which have conversion specifiers for string.format() ("%o" and "%x", respectively).
Also, neither Nikolaus's solution nor mine handle non-integers particularly well. I emphasize that here by forcing the value n to an integer with math.floor() at the beginning.
Correctly converting a general floating point value to any base (even base 10) is fraught with subtleties, which I leave as an exercise to the reader.
you can use a loop to convert an integer into a string containting the required base. for bases below 10 use the following code, if you need a base larger than that you need to add a line that mapps the result of x % base to a character (usign an array for example)
x = 1234
r = ""
base = 8
while x > 0 do
r = "" .. (x % base ) .. r
x = math.floor(x / base)
end
print( r );

Resources