foreach command in Stata - foreach

I am using panel data, where the variable countrynum is the country number for 138 countries and icr is the independent variable. To conduct a poolability test I have to run the below code to get the variables icr_1, icr_2, icr_3 ... icr_138.
However, the code only generates icr_1. Can someone help me understand why? I need all 138 variables.
xi, prefix(C) i.countrynum
gen Ccountrynum_1=1 if countrynum==1
replace Ccountrynum_1=0 if countrynum!=1
foreach var of varlist icr {
foreach num of numlist 1(1)138{
gen `var'_`num'=`var'* Ccountrynum_`num'
}
}

There are some things in your code that I would do differently, but I don't see anything that brings up an error. Rather than debugging code I think it's more useful to suggest an easier way for what you seem to be doing:
separate icr, by(countrynum)
xi is an older command which has been superseded by factor variable notation, so you only need xi in case you're using an older command that doesn't support this, which I think is not the case here.
To do a poolability test as I understand it you can run a regression with i.countrynum like this:
reg y x1 x2 x... i.countrynum
testparm i.countrynum
The output of testparm will tell you whether the country dummies are jointly significant.

I don't follow this easily. Let's first note that
gen Ccountrynum_1=1 if countrynum==1
replace Ccountrynum_1=0 if countrynum!=1
simplifies to
gen Ccountrynum_1 = countrynum == 1
That said, the double loop
foreach var of varlist icr {
foreach num of numlist 1(1)138{
gen `var'_`num'=`var'* Ccountrynum_`num'
}
}
simplifies to a single loop
forval num = 1/138 {
gen icr_`num' = icr * Ccountrynum_`num'
}
That said, it's hard to understand why that code should be expected to work as you only explain the generation of Ccountrynum_1.
It's really unusual to need that number of extra variables. In addition to #Wouter Wakker's suggestion, tabulate, generate() allows generation of indicator variables without a loop for whenever they are essential.

Related

Stata: using foreach to rename numeric variables

I have a large dataset where subsets of variables have been entered with the same prefix, followed by an underscore and some details. They are all binary YN and the variables are all doubles. For example, I have the variables onsite_healthclinic and onsite_CBO where values can only be 1 or 0.
I want to rename them all according to the question they are on the survey I'm working off of (so the above variables would become q0052_healthclinic and q0052_CBO), but if I use the code below using substr I (obviously) get type mismatch:
foreach var in onsite_healthclinic onsite_CBO {
local new = substr(`var', 8, .)
rename `new' q0052_`new'
}
My question is, is there another command other than substr that I can use so that I don't have to either a) convert all of the variables to strings first; or b) rename them all manually (there are ~20 in each subset, so while doable, it's a waste of time).
There is no need for a loop here at all. Although the essential answer is one line long I give here a complete, self-contained answer.
clear
set obs 1
foreach v in onsite_healthclinic onsite_CBO {
gen `v' = 1
}
rename onsite_* q0052_*
describe, fullnames
This answer implies that you've not studied the help under rename groups.
Will this work?
foreach var in onsite_healthclinic onsite_CBO {
local new = substr("`var'", 8, .)
rename onsite_`new' q0052_`new'
}
I added quotes around the call to the local var in the substr function and added onsite_ to the rename and that seemed to work.

Stata: perform a foreach loop to calculate kappa across a large data file

I have a data file in Stata with 50 variables
j-r-hp j-p-hp j-m-hp p-c-hp p-r-hp p-p-hp p-m-hp ... etc,
I want to perform a weighted kappa between pairs, so that the first might be
kap j-r-hp j-p-hp, wgt(w2)
and the next would be
kap j-r-hp j-m-hp, wgt(w2)
I am new to Stata. Is there a straightforward way to use a loop for this, like a foreach loop?
Your variable names are not legal names in Stata, so I've changed the hyphens to underscores in the example below. Also, I don't know what it means to 'perform a weighted kappa', so my answer uses random normal variables and the corr[elate] command. You can use the results that Stata leaves behind in r() (see return list) to gather the results for the separate analyses.
The idea is to gather the variables in a list using a local, then to loop over each element in that list (but skipping the repeated pairs using continue). If you have many variables with structured names, you could instead use ds, which leaves r(varlist) in r().Have a look at the help file for macros (help macro and help extended_fcn), especially the section on 'Macro extended functions for parsing'. Hope this helps.
clear
set obs 100
local vars j_r_hp j_p_hp j_m_hp p_c_hp p_r_hp p_p_hp p_m_hp
foreach var of local vars {
gen `var'=rnormal()
}
forval ii=1/`: word count `vars'' {
forval jj=1/`: word count `vars'' {
if `ii'<`jj' continue
corr `: word `ii' of `vars'' `: word `jj' of `vars''
}
}
You can take advantage of the user-written command tuples (run ssc install tuples):
clear
set more off
*----- example data -----
set obs 100
local vars j_r_hp j_p_hp j_m_hp p_c_hp p_r_hp p_p_hp p_m_hp
foreach var of local vars {
gen `var' = abs(round(rnormal()*100))
}
*----- what you want -----
tuples `vars', min(2) max(2)
forvalues i = 1/`ntuples' {
display _newline(3) "variables `tuple`i''"
kappa `tuple`i''
}
How you get the variables names together to feed them into tuples will depend on the dataset.
This is a variation on the helpful answer by #Matthijs, but it really won't fit well into a comment. The main extra twists are
The use of tokenize to avoid repeated use of word # of. After tokenize the separate words of the argument (here separate variable names) are held in macros 1 up. Thus tokenize a b c puts a in local macro 1, b in local macro 2 and c in local macro 3. Nested macro references are treated exactly like parenthesised expressions in elementary algebra; what is on the inside is evaluated first.
Focusing directly on part of the notional matrix of results on one side of the diagonal. The small trick is to ensure that one matrix subscript exceeds the other subscript.
Random normal input doesn't make sense for kap, but you will be using your own data any way.
clear
set obs 100
local vars j_r_hp j_p_hp j_m_hp p_c_hp p_r_hp p_p_hp p_m_hp
foreach var of local vars {
gen `var' = rnormal()
}
tokenize `vars'
local p : word count `vars'
local pm1 = `p' - 1
forval i = 1/`pm1' {
local ip1 = `i' + 1
forval j = `ip1'/`p' {
di "``i'' and ``j''"
kap ``i'' ``j''
di
}
}
I thought I might add my own answer in addition to highlight a few things.
The first thing to note is that for a new user, the most "straightforward" way to do it would likely involve hard-coding all variables into a local to use in a loop (as other answers suggest), or referencing them using a wildcard and writing more than one loop for each group. See the example below on how you might use a wildcard:
clear *
sysuse auto
/* Rename variables to match your .dta file and identify groups */
rename (price mpg rep78) (j_r_hp j_p_hp j_m_hp)
rename (headroom trunk weight) (p_c_hp p_r_hp p_m_hp)
rename (length turn displacement foreign) (z_r_hp z_m_hp z_p_hp z_c_hp)
/* Loop over all variables beginning with j and ending hp */
foreach x of varlist j*hp {
foreach i of varlist j*hp {
if "`x'" != "`i'" & "`i'" >= "`x'"{ // This section ensures you get only
// unique pairs of x & i
kap `x' `i'
}
}
}
/* Loop over all variables beginning with p and ending hp */
foreach x of varlist p*hp {
* something involving x
}
* etc.
Now, depending on how many groups you have or how many variables you have, this might not seem straightforward after all.
This brings up the second thing I would like to mention. In cases where hard-coding many variables or many repeated commands becomes cumbersome, I tend to favor a programmatic solution. This will often involve writing more code up front, but in many cases tends to be at least quasi-generalizable, and will allow you to easily evaluate hundreds of variables if you ever have the need without having to write them all out.
The code below uses the returned results from describe, along with some foreach loops and some extended macro functions to execute the kappa command over your variables without having to store them in a local manually.
clear *
sysuse auto
rename (price mpg rep78) (j_r_hp j_p_hp j_m_hp)
rename (headroom trunk weight) (p_c_hp p_r_hp p_m_hp)
rename (length turn displacement foreign) (z_r_hp z_m_hp z_p_hp z_c_hp)
/*
use gear_ratio as an arbitrary weight, order it first to easily extract
from the local containing varlist
*/
order gear_ratio, first
qui describe, varlist
local Varlist `r(varlist)' // store varlist in a local macro
preserve // preserve data so canges can be reverted back
foreach x of local Varlist {
capture confirm numeric variable `x'
if _rc {
drop `x' // Keep only numeric variables to use in kappa
}
}
qui describe, varlist // replace the local macro varlist with now numeric only variables
local Varlist `r(varlist)'
local vars : list Varlist - weight // remove weight from analysis varlist
foreach x of local vars {
foreach i of local vars {
if "`x'" != "`i'" & "`i'" >= "`x'" {
gettoken leftx : x, parse("_")
gettoken lefti : i, parse("_")
if "`leftx'" == "`lefti'" {
kap `x' `i'
}
}
}
}
restore
There of course will be a learning curve here for new users but I've found the use of macros, loops and returned results to be wonderfully effective in adding flexibility to my programs and do files - I would highly suggest anybody using Stata at least studies the basics of these three topics.

string comparison against factors in Stata

Suppose I have a factor variable with labels "a" "b" and "c" and want to see which observations have a label of "b". Stata refuses to parse
gen isb = myfactor == "b"
Sure, there is literally a "type mismatch", since my factor is encoded as an integer and so cannot be compared to the string "b". However, it wouldn't kill Stata to (i) perform the obvious parse or (ii) provide a translator function so I can write the comparison as label(myfactor) == "b". Using decode to (re)create a string variable defeats the purpose of encoding, which is to save space and make computations more efficient, right?
I hadn't really expected the comparison above to work, but I at least figured there would be a one- or two-line approach. Here is what I have found so far. There is a nice macro ("extended") function that maps the other way (from an integer to a label, seen below as local labi: label ...). Here's the solution using it:
// sample data
clear
input str5 mystr int mynum
a 5
b 5
b 6
c 4
end
encode mystr, gen(myfactor)
// first, how many groups are there?
by myfactor, sort: gen ng = _n == 1
replace ng = sum(ng)
scalar ng = ng[_N]
drop ng
// now, which code corresponds to "b"?
forvalues i = 1/`=ng'{
local labi: label myfactor `i'
if "b" == "`labi'" {
scalar bcode = `i'
break
}
}
di bcode
The second step is what irks me, but I'm sure there's a also faster, more idiomatic way of performing the first step. Can I grab the length of the label vector, for example?
An example:
clear all
set more off
sysuse auto
gen isdom = 1 if foreign == "Domestic":`:value label foreign'
list foreign isdom in 1/60
This creates a variable called isdom and it will equal 1 if foreigns's value label is equal to "Domestic". It uses an extended macro function.
From [U] 18.3.8 Macro expressions:
Also, typing
command that makes reference to `:extended macro function'
is equivalent to
local macroname : extended macro function
command that makes reference to `macroname'
This explains one of the two : in the offered syntax. The other can be explained by
... to specify value labels directly in an expression, rather than through
the underlying numeric value ... You specify the label in double quotes
(""), followed by a colon (:), followed by the name of the value
label.
The quote is from Stata tip 14: Using value labels in expressions, by Kenneth Higbee, The Stata Journal (2004). Freely available at http://www.stata-journal.com/sjpdf.html?articlenum=dm0009
Edit
On computing the number of distinct observations, another way is:
by myfactor, sort: gen ng = _n == 1
count if ng
scalar sc_ng = r(N)
display sc_ng
But yours is fine. In fact, it is documented here: http://www.stata.com/support/faqs/data-management/number-of-distinct-observations/, along with more methods and comments.

Implementing post / pre increment / decrement when translating to Lua

I'm writing a LSL to Lua translator, and I'm having all sorts of trouble implementing incrementing and decrementing operators. LSL has such things using the usual C like syntax (x++, x--, ++x, --x), but Lua does not. Just to avoid massive amounts of typing, I refer to these sorts of operators as "crements". In the below code, I'll use "..." to represent other parts of the expression.
... x += 1 ...
Wont work, coz Lua only has simple assignment.
... x = x + 1 ...
Wont work coz that's a statement, and Lua can't use statements in expressions. LSL can use crements in expressions.
function preIncrement(x) x = x + 1; return x; end
... preIncrement(x) ...
While it does provide the correct value in the expression, Lua is pass by value for numbers, so the original variable is not changed. If I could get this to actually change the variable, then all is good. Messing with the environment might not be such a good idea, dunno what scope x is. I think I'll investigate that next. The translator could output scope details.
Assuming the above function exists -
... x = preIncrement(x) ...
Wont work for the "it's a statement" reason.
Other solutions start to get really messy.
x = preIncrement(x)
... x ...
Works fine, except when the original LSL code is something like this -
while (doOneThing(x++))
{
doOtherThing(x);
}
Which becomes a whole can of worms. Using tables in the function -
function preIncrement(x) x[1] = x[1] + 1; return x[1]; end
temp = {x}
... preincrement(temp) ...
x = temp[1]
Is even messier, and has the same problems.
Starting to look like I might have to actually analyse the surrounding code instead of just doing simple translations to sort out what the correct way to implement any given crement will be. Anybody got any simple ideas?
I think to really do this properly you're going to have to do some more detailed analysis, and splitting of some expressions into multiple statements, although many can probably be translated pretty straight-forwardly.
Note that at least in C, you can delay post-increments/decrements to the next "sequence point", and put pre-increments/decrements before the previous sequence point; sequence points are only located in a few places: between statements, at "short-circuit operators" (&& and ||), etc. (more info here)
So it's fine to replace x = *y++ + z * f (); with { x = *y + z * f(); y = y + 1; }—the user isn't allowed to assume that y will be incremented before anything else in the statement, only that the value used in *y will be y before it's incremented. Similarly, x = *--y + z * f(); can be replaced with { y = y - 1; x = *y + z * f (); }
Lua is designed to be pretty much impervious to implementations of this sort of thing. It may be done as kind of a compiler/interpreter issue, since the interpreter can know that variables only change when a statement is executed.
There's no way to implement this kind of thing in Lua. Not in the general case. You could do it for global variables by passing a string to the increment function. But obviously it wouldn't work for locals, or for variables that are in a table that is itself global.
Lua doesn't want you to do it; it's best to find a way to work within the restriction. And that means code analysis.
Your proposed solution only will work when your Lua variables are all global. Unless this is something LSL also does, you will get trouble translating LSL programs that use variables called the same way in different places.
Lua is only able of modifying one lvalue per statement - tables being passed to functions are the only exception to this rule. You could use a local table to store all locals, and that would help you out with the pre-...-crements; they can be evaluated before the expression they are contained in is evauated. But the post-...-crements have to be evaluated later on, which is simply not possible in lua - at least not without some ugly code involving anonymous functions.
So you have one options: you must accept that some LSL statements will get translated to several Lua statements.
Say you have a LSL statement with increments like this:
f(integer x) {
integer y = x + x++;
return (y + ++y)
}
You can translate this to a Lua statement like this:
function f(x) {
local post_incremented_x = x + 1 -- extra statement 1 for post increment
local y = x + post_incremented_x
x = post_incremented_x -- extra statement 2 for post increment
local pre_incremented_y = y + 1
return y + pre_incremented_y
y = pre_incremented_y -- this line will never be executed
}
So you basically will have to add two statements per ..-crement used in your statements. For complex structures, that will mean calculating the order in which the expressions are evaluated.
For what is worth, I like with having post decrements and predecrements as individual statements in languages. But I consider it a flaw of the language when they can also be used as expressions. The syntactic sugar quickly becomes semantic diabetes.
After some research and thinking I've come up with an idea that might work.
For globals -
function preIncrement(x)
_G[x] = _G[x] + 1
return _G[x]
end
... preIncrement("x") ...
For locals and function parameters (which are locals to) I know at the time I'm parsing the crement that it is local, I can store four flags to tell me which of the four crements is being used in the variables AST structure. Then when it comes time to output the variables definition, I can output something like this -
local x;
function preIncrement_x() x = x + 1; return x; end
function postDecrement_x() local y = x; x = x - 1; return y; end
... preIncrement_x() ...
In most of your assessment of configurability to the code. You are trying to hard pass data types from one into another. And call it a 'translator'. And in all of this you miss regex and other pattern match capacities. Which are far more present in LUA than LSL. And since the LSL code is being passed into LUA. Try using them, along with other functions. Which would define the work more as a translator, than a hard pass.
Yes I know this was asked a while ago. Though, for other viewers of this topic. Never forget the environments you are working in. EVER. Use what they give you to the best ability you can.

Writing F# code to parse "2 + 2" into code

Extremely just-started-yesterday new to F#.
What I want: To write code that parses the string "2 + 2" into (using as an example code from the tutorial project) Expr.Add(Expr.Num 2, Expr.Num 2) for evaluation. Some help to at least point me in the right direction or tell me it's too complex for my first F# project. (This is how I learn things: By bashing my head against stuff that's hard)
What I have: My best guess at code to extract the numbers. Probably horribly off base. Also, a lack of clue.
let script = "2 + 2";
let rec scriptParse xs =
match xs with
| [] -> (double)0
| y::ys -> (double)y
let split = (script.Split([|' '|]))
let f x = (split[x]) // "This code is not a function and cannot be applied."
let list = [ for x in 0..script.Length -> f x ]
let result = scriptParse
Thanks.
The immediate issue that you're running into is that split is an array of strings. To access an element of this array, the syntax is split.[x], not split[x] (which would apply split to the singleton list [x], assuming it were a function).
Here are a few other issues:
Your definition of list is probably wrong: x ranges up to the length of script, not the length of the array split. If you want to convert an array or other sequence to a list you can just use List.ofSeq or Seq.toList instead of an explicit list comprehension [...].
Your "casts" to double are a bit odd - that's not the right syntax for performing conversions in F#, although it will work in this case. double is a function, so the parentheses are unnecessary and what you are doing is really calling double 0 and double y. You should just use 0.0 for the first case, and in the second case, it's unclear what you are converting from.
In general, it would probably be better to do a bit more design up front to decide what your overall strategy will be, since it's not clear to me that you'll be able to piece together a working parser based on your current approach. There are several well known techniques for writing a parser - are you trying to use a particular approach?

Resources