Split or tokenize within Stata program with using statement? - parsing

I am trying to use a program to speed up a repetitive Stata task. This is the first part of my program:
program alphaoj
syntax [varlist] , using(string) occ_level(integer) ind_level(integer)
import excel `using', firstrow
display "`using'"
split "`using'", parse(_)
local year = `2'
display "`year'"
display `year'
When I run this program, using the line alphaoj, ind_level(4) occ_level(5) using("nat4d_2002_dl.xls"), I receive the error factor-variable and time-series operators not allowed r(101);
I am not quite sure what is being treated as a factor or time series operator.
I have replaced the split line with tokenize, and the parse statement with parse("_"), and I continue to run into errors. In that case, it says _ not found r(111);
Ideally, I would have it take the year from the filename and use that year as the local.
I am struggling with how I should perform this seemingly simple task.

An error is returned because the split command only accepts string variables. You can't pass a string directly to it. See help split for more details.
You can achieve your goal of extracting the year from the filename and storing that as a local macro. See below:
program alphaoj
syntax [varlist], using(string)
import excel `using', firstrow
gen stringvar = "`using'"
split stringvar, parse(_)
local year = stringvar2
display `year'
end
alphaoj, using("nat4d_2002_dl.xls")
The last line prints "2002" to the console.
Alternative solution that avoids creating an extra variable:
program alphaoj
syntax [varlist], using(string)
import excel `using', firstrow
local year = substr("`using'",7,4)
di `year'
end
alphaoj, using("nat4d_2002_dl.xls")
Please note that this solution is reliant on the Excel files all having the exact same character structure.

Related

Lua Pattern matching only returning first match

I can't figure out how to get Lua to return ALL matches for a particular pattern match.
I have the following regex which works and is so basic:
.*\n
This just splits a long string per line.
The equivelent of this in Lua is:
.-\n
If you run the above in a regex website against the following text it will find three matches (if using the global flag).
Hello
my name is
Someone
If you do not use the global flag it will return only the first match. This is the behaviour of LUA; it's as if it does not have a global switch and will only ever return the first match.
The exact code I have is:
local test = {string.match(string_variable_here, ".-\n")}
If I run it on the above test for example, test will be a table with only one item (the first row). I even tried using capture groups but the result is the same.
I cannot find a way to make it return all occurrences of a match, does anyone know if this is possible in LUA?
Thanks,
You can use string.gmatch(s, pattern) / s:gmatch(pattern):
This returns a pattern finding iterator. The iterator will search through the string passed looking for instances of the pattern you passed.
See the online Lua demo:
local a = "Hello\nmy name is\nSomeone\n"
for i in string.gmatch(a, ".*\n") do
print(i)
end
Note that .*\n regex is equivalent to .*\n Lua pattern. - in Lua patterns is the equivalent of *? non-greedy ("lazy") quantifier.

Pipe character ignored in SPSS syntax

I am trying to use the pipe character "|" in SPSS syntax with strange results:
In the syntax it appears like this:
But when I copy this line from the syntax window to here, this is what I get:
SELECT IF(SEX = 1 SEX = 2).
The pipe just disappears!
If I run this line, this is the output:
SELECT IF(SEX = 1 SEX = 2).
Error # 4007 in column 20. Text: SEX
The expression is incomplete. Check for missing operands, invalid operators,
unmatched parentheses or excessive string length.
Execution of this command stops.
So the pipe is invisible to the program too!
When I save this syntax and reopen it, the pipe is gone...
The only way I found to get SPSS to work with the pipe is when I edited the syntax (adding the pipe) and saved it in an alternative editor (notepad++ in this case). Now, without opening the syntax, I ran it from another syntax using insert command, and it worked.
EDIT: some background info:
I have spss version 23 (+service pack 3) 64 bit.
The same things happens if I use my locale (encoding: windows-1255) or Unicode (Encoding: UTF-8). Suspecting my Hebrew keyboard I tried copying syntax from the web with same results.
Can anyone shed any light on this subject?
Turns out (according to SPSS support) that's a version specific (ver. 21) bug and was fixed in later versions.

How to split particular words in lua

I am trying to split this statement in Lua
sendex,000D6F0011BA2D60,fb,btn,1,on,100,null
i need output like this way:
Mac:000D6F0011BA2D60
Value:1
command:on
value:100
how to split and get the values?
local input = "sendex,000D6F0011BA2D60,fb,btn,1,on,100,null"
local buffer = {}
for word in input:gmatch('[^,]+') do
table.insert(buffer, word)
--print(word) -- uncomment this to see the words as they are being matched ;)
end
print("Mac:"..buffer[2])
print("Value:"..buffer[5])
...
For a complete explanation of what string.gmatch does, see the Lua reference. To summarize, it iterates over a string and searches for a pattern, in this case [^,]+, meaning all groups of 1 or more characters that aren't a comma. Every time it finds said pattern, it does something with it and continues searching.
If your input is exactly like you have described, the code below works:
s="sendex,000D6F0011BA2D60,fb,btn,1,on,100,null"
Mac,Value,command,value = s:match(".-,(.-),.-,.-,(.-),(.-),(.-),")
print(Mac,Value,command,value)
It uses the non-greedy pattern .- to split the input into fields. It also captures the relevant fields.

Lua--parse text from .txt file and store the values

I receive the following error when I try to run my code:
lua:readFile.lua:7: attempt to call method 'split' (a nil value)
I am teaching myself Lua and doing some exercises. I am trying to parse out the individual values in a text file and then do stuff with them. I can open the file and if I don't try to parse out the values I can print the contents.
I have tried, separately:
dollars, tickets = line:split(" ")
dollars, tickets = line:split("(%w+)", " ")
Along with several other iterations I cannot recall at this point.
Here is my code:
myfile = io.open("C:\\tickets.txt", "r")
if myfile then
print("True") --test print
for line in myfile:lines() do
local dollars, tickets = unpack(line:split(" "))
print(dollars)
end
end
print("Done") --test print
myfile:close()
Here is the content of the tickets.txt file in its entirety:
250 5750
100 28000
50 35750
25 18750
I am obviously missing something in the split method but I do not know enough to know what.
Regards.
If you only want to read numbers from a file and do not want to enforce them to be two on each line, you can use this code:
while true do
local dollars,tickets = myfile:read("*n","*n")
if dollars==nil or tickets==nil then break end
print(dollars)
end
The string library in Lua doesn't include a 'split' function. You will have to implement one yourself (there's examples on the Lua wiki), or use Lua's pattern matching functionality to parse out the pieces. For example, you could do something like this:
local dollars, tickets = line:match("(%d+) (%d+)")

How to write an array into a text file in maxima?

I am relatively new to maxima. I want to know how to write an array into a text file using maxima.
I know it's late in the game for the original post, but I'll leave this here in case someone finds it in a search.
Let A be a Lisp array, Maxima array, matrix, list, or nested list. Then:
write_data (A, "some_file.data");
Let S be an ouput stream (created by openw or opena). Then:
write_data (A, S);
Entering ?? numericalio at the input prompt, or ?? write_ or ?? read_, will show some info about this function and related ones.
I've never used maxima (or even heard of it), but a little Google searching out of curiousity turned up this: http://arachnoid.com/maxima/files_functions.html
From what I can gather, you should be able to do something like this:
stringout("my_new_file.txt",values);
It says the second parameter to the stringout function can be one or more of these:
input: all user entries since the beginning of the session.
values: all user variable and array assignments.
functions: all user-defined functions (including functions defined within any loaded packages).
all: all of the above. Such a list is normally useful only for editing and extraction of useful sections.
So by passing values it should save your array assignments to file.
A bit more necroposting, as google leads here, but I haven't found it useful enough. I've needed to export it as following:
-0.8000,-0.8000,-0.2422,-0.242
-0.7942,-0.7942,-0.2387,-0.239
-0.7776,-0.7776,-0.2285,-0.228
-0.7514,-0.7514,-0.2124,-0.212
-0.7168,-0.7168,-0.1912,-0.191
-0.6750,-0.6750,-0.1655,-0.166
-0.6272,-0.6272,-0.1362,-0.136
-0.5746,-0.5746,-0.1039,-0.104
So I've found how to do this with printf:
with_stdout(filename, for i:1 thru length(z_points) do
printf (true,"~,4f,~,4f,~,4f,~,3f~%",bot_points[i],bot_points[i],top_points[i],top_points[i]));
A bit cleaner variation on the #ProdoElmit's answer:
list : [1,2,3,4,5]$
with_stdout("file.txt", apply(print, list))$
/* 1 2 3 4 5 is then what appears in file.txt */
Here the trick with apply is needed as you probably don't want to have square brackets in your output, as is produced by print(list).
For a matrix to be printed out, I would have done the following:
m : matrix([1,2],[3,4])$
with_stdout("file.txt", for row in args(m) do apply(print, row))$
/* 1 2
3 4
is what you then have in file.txt */
Note that in my solution the values are separated with spaces and the format of your values is fixed to that provided by print. Another caveat is that there is a limit on the number of function parameters: for example, for me (GCL 2.6.12) my method does not work if length(list) > 64.

Resources