Stata: using foreach to rename numeric variables - foreach

I have a large dataset where subsets of variables have been entered with the same prefix, followed by an underscore and some details. They are all binary YN and the variables are all doubles. For example, I have the variables onsite_healthclinic and onsite_CBO where values can only be 1 or 0.
I want to rename them all according to the question they are on the survey I'm working off of (so the above variables would become q0052_healthclinic and q0052_CBO), but if I use the code below using substr I (obviously) get type mismatch:
foreach var in onsite_healthclinic onsite_CBO {
local new = substr(`var', 8, .)
rename `new' q0052_`new'
}
My question is, is there another command other than substr that I can use so that I don't have to either a) convert all of the variables to strings first; or b) rename them all manually (there are ~20 in each subset, so while doable, it's a waste of time).

There is no need for a loop here at all. Although the essential answer is one line long I give here a complete, self-contained answer.
clear
set obs 1
foreach v in onsite_healthclinic onsite_CBO {
gen `v' = 1
}
rename onsite_* q0052_*
describe, fullnames
This answer implies that you've not studied the help under rename groups.

Will this work?
foreach var in onsite_healthclinic onsite_CBO {
local new = substr("`var'", 8, .)
rename onsite_`new' q0052_`new'
}
I added quotes around the call to the local var in the substr function and added onsite_ to the rename and that seemed to work.

Related

I am looking for a Lua find and replace logic

enter image description here
I just started working on lua scripting since a week. I have a lua file where in the logic needs to be written for a certain condition.
The condition when gets triggered
it does an iteration on one of the fields to change value from
(ABC123-XYZ) to this value
(ABC123#1-XYZ) and it keeps increasing whenever iterations happens (ABC123#2-XYZ)
I need to run a function that removes the # followed by number to change it back to (ABC123-XYZ). Looking for any advice!
Edit 1:
Below is the updated code that is written Thanks to #Piglet
I have another scenario if therr are two hashes in the variable.
local x = 'BUS144611111-PNB_00#80901#1555-122TRNHUBUS'
local b = x:gsub("#%d+","")
function remove_char(a) a=a:gsub("#%d+","")
return a;
end if string.match(x,"#")
then print('function')
print(remove_char(x));
else print(x);
end
Expected output should be
x = 'BUS144611111-PNB_00#80901-122TRNHUBUS' for the aforesaid variable
local a = "ABC123#1-XYZ"
local b = a:gsub("#%d+", "")
this will remove any # followed by or one more digits from your string.

foreach command in Stata

I am using panel data, where the variable countrynum is the country number for 138 countries and icr is the independent variable. To conduct a poolability test I have to run the below code to get the variables icr_1, icr_2, icr_3 ... icr_138.
However, the code only generates icr_1. Can someone help me understand why? I need all 138 variables.
xi, prefix(C) i.countrynum
gen Ccountrynum_1=1 if countrynum==1
replace Ccountrynum_1=0 if countrynum!=1
foreach var of varlist icr {
foreach num of numlist 1(1)138{
gen `var'_`num'=`var'* Ccountrynum_`num'
}
}
There are some things in your code that I would do differently, but I don't see anything that brings up an error. Rather than debugging code I think it's more useful to suggest an easier way for what you seem to be doing:
separate icr, by(countrynum)
xi is an older command which has been superseded by factor variable notation, so you only need xi in case you're using an older command that doesn't support this, which I think is not the case here.
To do a poolability test as I understand it you can run a regression with i.countrynum like this:
reg y x1 x2 x... i.countrynum
testparm i.countrynum
The output of testparm will tell you whether the country dummies are jointly significant.
I don't follow this easily. Let's first note that
gen Ccountrynum_1=1 if countrynum==1
replace Ccountrynum_1=0 if countrynum!=1
simplifies to
gen Ccountrynum_1 = countrynum == 1
That said, the double loop
foreach var of varlist icr {
foreach num of numlist 1(1)138{
gen `var'_`num'=`var'* Ccountrynum_`num'
}
}
simplifies to a single loop
forval num = 1/138 {
gen icr_`num' = icr * Ccountrynum_`num'
}
That said, it's hard to understand why that code should be expected to work as you only explain the generation of Ccountrynum_1.
It's really unusual to need that number of extra variables. In addition to #Wouter Wakker's suggestion, tabulate, generate() allows generation of indicator variables without a loop for whenever they are essential.

Modify values programmatically SPSS

I have a file with more than 250 variables and more than 100 cases. Some of these variables have an error in decimal dot (20445.12 should be 2.044512).
I want to modify programatically these data, I found a possible way in a Visual Basic editor provided by SPSS (I show you a screen shot below), but I have an absolute lack of knowledge.
How can I select a range of cells in this language?
How can I store the cell once modified its data?
--- EDITED NEW DATA ----
Thank you for your fast reply.
The problem now its the number of digits that number has. For example, error data could have the following format:
Case A) 43998 (five digits) ---> 4.3998 as correct value.
Case B) 4399 (four digits) ---> 4.3990 as correct value, but parsed as 0.4399 because 0 has been removed when file was created.
Is there any way, like:
IF (NUM < 10000) THEN NUM = NUM / 1000 ELSE NUM = NUM / 10000
Or something like IF (Number_of_digits(NUM)) THEN ...
Thank you.
there's no need for VB script, go this way:
open a syntax window, paste the following code:
do repeat vr=var1 var2 var3 var4.
compute vr=vr/10000.
end repeat.
save outfile="filepath\My corrected data.sav".
exe.
Replace var1 var2 var3 var4 with the names of the actual variables you need to change. For variables that are contiguous in the file you may use var1 to var4.
Replace vr=vr/10000 with whatever mathematical calculation you would like to use to correct the data.
Replace "filepath\My corrected data.sav" with your path and file name.
WARNING: this syntax will change the data in your file. You should make sure to create a backup of your original in addition to saving the corrected data to a new file.

Stata: perform a foreach loop to calculate kappa across a large data file

I have a data file in Stata with 50 variables
j-r-hp j-p-hp j-m-hp p-c-hp p-r-hp p-p-hp p-m-hp ... etc,
I want to perform a weighted kappa between pairs, so that the first might be
kap j-r-hp j-p-hp, wgt(w2)
and the next would be
kap j-r-hp j-m-hp, wgt(w2)
I am new to Stata. Is there a straightforward way to use a loop for this, like a foreach loop?
Your variable names are not legal names in Stata, so I've changed the hyphens to underscores in the example below. Also, I don't know what it means to 'perform a weighted kappa', so my answer uses random normal variables and the corr[elate] command. You can use the results that Stata leaves behind in r() (see return list) to gather the results for the separate analyses.
The idea is to gather the variables in a list using a local, then to loop over each element in that list (but skipping the repeated pairs using continue). If you have many variables with structured names, you could instead use ds, which leaves r(varlist) in r().Have a look at the help file for macros (help macro and help extended_fcn), especially the section on 'Macro extended functions for parsing'. Hope this helps.
clear
set obs 100
local vars j_r_hp j_p_hp j_m_hp p_c_hp p_r_hp p_p_hp p_m_hp
foreach var of local vars {
gen `var'=rnormal()
}
forval ii=1/`: word count `vars'' {
forval jj=1/`: word count `vars'' {
if `ii'<`jj' continue
corr `: word `ii' of `vars'' `: word `jj' of `vars''
}
}
You can take advantage of the user-written command tuples (run ssc install tuples):
clear
set more off
*----- example data -----
set obs 100
local vars j_r_hp j_p_hp j_m_hp p_c_hp p_r_hp p_p_hp p_m_hp
foreach var of local vars {
gen `var' = abs(round(rnormal()*100))
}
*----- what you want -----
tuples `vars', min(2) max(2)
forvalues i = 1/`ntuples' {
display _newline(3) "variables `tuple`i''"
kappa `tuple`i''
}
How you get the variables names together to feed them into tuples will depend on the dataset.
This is a variation on the helpful answer by #Matthijs, but it really won't fit well into a comment. The main extra twists are
The use of tokenize to avoid repeated use of word # of. After tokenize the separate words of the argument (here separate variable names) are held in macros 1 up. Thus tokenize a b c puts a in local macro 1, b in local macro 2 and c in local macro 3. Nested macro references are treated exactly like parenthesised expressions in elementary algebra; what is on the inside is evaluated first.
Focusing directly on part of the notional matrix of results on one side of the diagonal. The small trick is to ensure that one matrix subscript exceeds the other subscript.
Random normal input doesn't make sense for kap, but you will be using your own data any way.
clear
set obs 100
local vars j_r_hp j_p_hp j_m_hp p_c_hp p_r_hp p_p_hp p_m_hp
foreach var of local vars {
gen `var' = rnormal()
}
tokenize `vars'
local p : word count `vars'
local pm1 = `p' - 1
forval i = 1/`pm1' {
local ip1 = `i' + 1
forval j = `ip1'/`p' {
di "``i'' and ``j''"
kap ``i'' ``j''
di
}
}
I thought I might add my own answer in addition to highlight a few things.
The first thing to note is that for a new user, the most "straightforward" way to do it would likely involve hard-coding all variables into a local to use in a loop (as other answers suggest), or referencing them using a wildcard and writing more than one loop for each group. See the example below on how you might use a wildcard:
clear *
sysuse auto
/* Rename variables to match your .dta file and identify groups */
rename (price mpg rep78) (j_r_hp j_p_hp j_m_hp)
rename (headroom trunk weight) (p_c_hp p_r_hp p_m_hp)
rename (length turn displacement foreign) (z_r_hp z_m_hp z_p_hp z_c_hp)
/* Loop over all variables beginning with j and ending hp */
foreach x of varlist j*hp {
foreach i of varlist j*hp {
if "`x'" != "`i'" & "`i'" >= "`x'"{ // This section ensures you get only
// unique pairs of x & i
kap `x' `i'
}
}
}
/* Loop over all variables beginning with p and ending hp */
foreach x of varlist p*hp {
* something involving x
}
* etc.
Now, depending on how many groups you have or how many variables you have, this might not seem straightforward after all.
This brings up the second thing I would like to mention. In cases where hard-coding many variables or many repeated commands becomes cumbersome, I tend to favor a programmatic solution. This will often involve writing more code up front, but in many cases tends to be at least quasi-generalizable, and will allow you to easily evaluate hundreds of variables if you ever have the need without having to write them all out.
The code below uses the returned results from describe, along with some foreach loops and some extended macro functions to execute the kappa command over your variables without having to store them in a local manually.
clear *
sysuse auto
rename (price mpg rep78) (j_r_hp j_p_hp j_m_hp)
rename (headroom trunk weight) (p_c_hp p_r_hp p_m_hp)
rename (length turn displacement foreign) (z_r_hp z_m_hp z_p_hp z_c_hp)
/*
use gear_ratio as an arbitrary weight, order it first to easily extract
from the local containing varlist
*/
order gear_ratio, first
qui describe, varlist
local Varlist `r(varlist)' // store varlist in a local macro
preserve // preserve data so canges can be reverted back
foreach x of local Varlist {
capture confirm numeric variable `x'
if _rc {
drop `x' // Keep only numeric variables to use in kappa
}
}
qui describe, varlist // replace the local macro varlist with now numeric only variables
local Varlist `r(varlist)'
local vars : list Varlist - weight // remove weight from analysis varlist
foreach x of local vars {
foreach i of local vars {
if "`x'" != "`i'" & "`i'" >= "`x'" {
gettoken leftx : x, parse("_")
gettoken lefti : i, parse("_")
if "`leftx'" == "`lefti'" {
kap `x' `i'
}
}
}
}
restore
There of course will be a learning curve here for new users but I've found the use of macros, loops and returned results to be wonderfully effective in adding flexibility to my programs and do files - I would highly suggest anybody using Stata at least studies the basics of these three topics.

string comparison against factors in Stata

Suppose I have a factor variable with labels "a" "b" and "c" and want to see which observations have a label of "b". Stata refuses to parse
gen isb = myfactor == "b"
Sure, there is literally a "type mismatch", since my factor is encoded as an integer and so cannot be compared to the string "b". However, it wouldn't kill Stata to (i) perform the obvious parse or (ii) provide a translator function so I can write the comparison as label(myfactor) == "b". Using decode to (re)create a string variable defeats the purpose of encoding, which is to save space and make computations more efficient, right?
I hadn't really expected the comparison above to work, but I at least figured there would be a one- or two-line approach. Here is what I have found so far. There is a nice macro ("extended") function that maps the other way (from an integer to a label, seen below as local labi: label ...). Here's the solution using it:
// sample data
clear
input str5 mystr int mynum
a 5
b 5
b 6
c 4
end
encode mystr, gen(myfactor)
// first, how many groups are there?
by myfactor, sort: gen ng = _n == 1
replace ng = sum(ng)
scalar ng = ng[_N]
drop ng
// now, which code corresponds to "b"?
forvalues i = 1/`=ng'{
local labi: label myfactor `i'
if "b" == "`labi'" {
scalar bcode = `i'
break
}
}
di bcode
The second step is what irks me, but I'm sure there's a also faster, more idiomatic way of performing the first step. Can I grab the length of the label vector, for example?
An example:
clear all
set more off
sysuse auto
gen isdom = 1 if foreign == "Domestic":`:value label foreign'
list foreign isdom in 1/60
This creates a variable called isdom and it will equal 1 if foreigns's value label is equal to "Domestic". It uses an extended macro function.
From [U] 18.3.8 Macro expressions:
Also, typing
command that makes reference to `:extended macro function'
is equivalent to
local macroname : extended macro function
command that makes reference to `macroname'
This explains one of the two : in the offered syntax. The other can be explained by
... to specify value labels directly in an expression, rather than through
the underlying numeric value ... You specify the label in double quotes
(""), followed by a colon (:), followed by the name of the value
label.
The quote is from Stata tip 14: Using value labels in expressions, by Kenneth Higbee, The Stata Journal (2004). Freely available at http://www.stata-journal.com/sjpdf.html?articlenum=dm0009
Edit
On computing the number of distinct observations, another way is:
by myfactor, sort: gen ng = _n == 1
count if ng
scalar sc_ng = r(N)
display sc_ng
But yours is fine. In fact, it is documented here: http://www.stata.com/support/faqs/data-management/number-of-distinct-observations/, along with more methods and comments.

Resources