I'm currently learning SAS and I got a code to be adapted and reused but first I have to understand it. My question is about just a small part of it (top of the code). Here it is:
%let dir=/home/user/PROJECT/CODES/;
%let dir_project=/home/user/PROJECT/;
libname inp "&dir_project" compress=yes;
libname out "&dir.out" compress=yes;
%let tg=out.vip;
My questions are:
What does &dir.out mean? What is it referring to? I suppose it's something called "out". Is it looking for a database OUT? If yes and all my databases are usually temporary ones in WORK should I change it to WORK.OUT?
What is the resulting path of "tg"? I doubt that it is: "/home/user/PROJECT/CODES/out.vip".
Originally the code was referring to some locations on C: drive but I work entirely in SAS Studio so I have to adapt it.
Thank you in advance
The first two statements define two macro variables, DIR and DIR_PROJECT. In the second two statements you use those macro variables to define two librefs, INP and OUT. The last statement just defines another macro variable named TG.
Macro variable references start with & and are followed by the name of the macro variable to expand. SAS will stop looking for the macro variable name when is sees a character that cannot be part of a macro variable name or a period. That is why the first libname statement uses the value of the DIR_PROJECT macro variable instead of the DIR macro variable. The period in the second libname statement tells SAS that you want to replace &dir. with the value of the macro variable DIR. If you had instead just written &dirout then SAS would look for a macro variable named DIROUT.
Macro variable just contain text. The meaning of the text depends on what SAS code you generate with them. So the first two macro variable look like they contain absolute paths to directories on your Unix file system, since they start from the root node / and end with a /. This is confirmed by how you use them to generate libname statements.
By adding the constant text out after the path in the second libname statement the result is that you are looking for a sub-directory named out in the directory that the value of the macro variable DIR names.
As for the last macro variable TG what it means depends on how it is used. Since it is of the form of two names separated by a period then it looks like it can be used to refer to a SAS dataset. Especially since the first name is the same as one of the librefs that you defined in the libname statements. So you might use that macro variable in code like this:
proc print data=&tg ; run;
Which would be expanded into:
proc print data=out.vip ; run;
In that case you are looking for the SAS dataset named VIP in the library named OUT. So you would be looking for the Unix file named:
/home/user/PROJECT/CODES/out/vip.sas7bdat
Now if you used that macro variable in some SQL code like this:
select &tg ...
Then it would expand to
select out.vip ....
and in that case you would be referencing a variable named VIP in an input dataset named (or aliased as) OUT.
1 - &dir. is a macro variable. The period marks the end of the variable, and thus &dir.out resolves to /home/user/PROJECT/CODES/out at runtime. Your libname statement will now link the libref out to this physical location.
2 - the tg variable is a dataset reference, in the form "library.dataset". Here, out is the library, and vip is the dataset. This way you can write code such as:
data &tg.;
set sashelp.class;
run;
To create the dataset vip in the out library.
In this way, you are in fact (almost) right. The resulting path of &tg. (which resolves to out.vip) will be /home/user/PROJECT/CODES/out/vip.sas7bdat.
Related
Let's say I've imported data from excel that has many, many variables, say v1 through v4000. Each of these is intended to be numeric, and most cases have numeric-only values, but there are some cases that have non-numeric characters. For some of those non-numerics, I know the meaning (e.g., "NA" for missing), and potentially some unknown strings that should be investigated.
For each variable, I think I would like to do something like 1) create a numeric version of that variable that has the original values for all cases that had numeric values, 2) create a list of unique string values for cases with non-numerics so those can be investigated. With 4,000 variables, I would ideally use some type of loop to do this.
How can that be done? Is it even possible?
I was able to solve this using the below macro, which creates a new variable with a "_str" suffix that holds the original values, and which can therefore be used to report frequencies of values that were turned into system missing values.
DEFINE destringvars(names=!cmdend)
!do !i !in (!names)
RENAME VARIABLES (!i=!concat(!i,"_str")).
STRING !i (A9).
compute !i=!concat(!i,"_str").
alter type !i(f8).
TEMPORARY.
SELECT if SYSMIS(!i).
FREQUENCY !concat(!i,"_str").
!DOEND
EXECUTE.
!enddefine
(Some background: I am not experienced with lldb or python and don't work on them frequently, but currently need to make some basic scripts for debugging an iphone program)
I am currently stopped at a breakpoint in side a function, and want to check the value of an array that has been accessed inside this function
This array is declared as
Float32 my_array[128];
and has global scope. I can view the array using print command, but I would like to make a python script so that I have more control over the output formatting and possibly plot the array elements as a graph using matplolib later on.
I am looking at the sample python code given in this question, and using the python given there I have verified that I can view local variables in this function (where currently I am stopped at a break point). For example, if I change 'base' in base=frame.FindVariable('base') to my local variable 'k' (the local variable is not an array) ,
base=frame.FindVariable('k')
then print base I can see the value of k. However, if I try this,
base=frame.FindVariable('my_array')
and do print base it gives me No value. How can I write a python command to get the values of any kind of variable currently in scope? Preferably it works for normal variables (int, float), arrays, and pointers, but if not, finding values of arrays are more important at the moment.
SBFrame.FindVariable searches among the variables local to that frame. It doesn't search among the global variables.
For that you need to use a search with a wider scope. If you know that the global variable is in the binary image containing the your frame's code - lldb calls that binary image a Module - then you can find the module containing that frame and use SBModule.FindGlobalVariables. If that's not true, you can search the whole target using SBTarget.FindGlobalVariables. If you know that only one global variable of that name exists, you can use FindFirstGlobalVariable variant.
All these commands will find variables of any type, and they all consistently return SBValues so you can format them in a consistent manner regardless of how you find them. For statically allocated arrays, the array elements are its children, so you can fetch individual elements with SBValue.GetChildAtIndex.
You can get to a SBFrame's module like:
module = frame.module
and its target:
target = frame.thread.process.target
lldb separates the contexts in which to search for variables primarily for efficiency. If SBFrame.FindVariable searched for globals as well as locals, a mistyped variable name would be a much more expensive mistake. But it also makes the call more predictable since you will never get some random global from some shared library that the system loaded on your behalf.
I am using the Data > Merge Files > Add Variables in SPSS. The two .sav files both contain a variable called "Student_No" which is numeric with the same width in each file. I am using this as the key variable in which to match cases. I am not indicating that cases are not sorted. It makes no difference if I indicate that the active or non-active data set is keyed. In either case the new variables are not properly matched with the cases.
What are some of the potential problems that might be causing this mismatch?
The dialog box pastes STAR JOIN syntax in some cases and MATCH FILES in others. There were some problems with STAR JOIN in older versions of Statistics, so you might need to use MATCH FILES instead. See the Command Syntax Reference for that command on how to do this.
I am not used to SPSS so this question will sound stupid:
I need to change fragments of a cell in spss, exemple:
'1.28'
'2.69'
'3.57'
to
'a.28'
'b.69'
'c.57'
What's the best way to do it?
Tks.
This is assuming the variable you want to recode is called 'VarA', and that it is numeric.
This creates a copy of the variable, converts it to a string, and then uses those values to create a new version that is recoded.
RECODE VarA (ELSE = COPY) INTO VarA_String.
ALTER TYPE VarA_String(A8).
EXECUTE.
COMPUTE VarA_r=REPLACE(VarA_String,'1.','a.').
COMPUTE VarA_r=REPLACE(VarA_String,'2.','b.').
COMPUTE VarA_r=REPLACE(VarA_String,'3.','c.').
EXECUTE.
The syntax is a little different in SPSS Modeler and bear with me as I can only attach one image until I have a certain reputation on SO.
After you convert VarA into a string (which I called to_str) you can use the replace command to change part of the substring, ie:
to_string(VarA)
for the first Derive node, and:
replace('1.','a.',to_str)
for the second Derive node, this command replaces all occurrences of SUBSTRING1 with SUBSTRING2 in STRING and you will get the same result but in Modeler, see the sample stream here
Assuming that these are strings, see the replace function in COMPUTE. If there are just a few, though, just edit the cells in the Data Editor.
Frequently, PSPP/SPSS syntax documentation (example) suggests I must to pass a list of variables with /VARIABLES=var_list and this is not an optional subcommand.
But I have a lot of datasets to process. I would like to programmatically get a list of all variables in the active dataset, pass that to a procedure, and then generate a file from the procedure output.
I've tried /VARIABLES=* but that didn't work.
error: DESCRIPTIVES: Syntax error at `*': expecting variable name.
You can use display variables. or display dictionary. to generate a table of all variables and their attributes which could then be captured using OMS. However if you want to pass all variables to a function that expects a list you can use all, i.e. descriptives /variables= all..
/VARIABLES is often optional in procedures, but ALL stands for all the variables. If you need to refine by a particular measurement level, type or other metadata, try the SPSSINC SELECT VARIABLES extension command. This command applies various filters based on the metadata and creates a macro with the variables that pass. You can then use that in any context.