Generate identity variable from dummy indicators in Stata - foreach

I working on somebody's dataset in Stata that uses dummy variables to indicate the subject id like the following:
variable name variable label
country_dummy1 Afghanistan
country_dummy2 Albania
country_dummy3 Algeria
...
This makes the dataset very hard to work with and I am trying to generate a subject id variable (country) from the dummies to look like this
country country_dummy1 country_dummy2 country_dummy3
Afghanistan 1 0 0
Albania 0 1 0
Algeria 0 0 1
I wrote the following command:
gen country = "."
foreach x of varlist country_dummy1-country_dummy175 {
local z : variable label `x'
replace country = `z' if `x'==1
}
Stata produced the following error message:
Afghanistan not found
r(111);
I have not been able to identify why this occurred.

You need
gen country = ""
foreach x of varlist country_dummy1-country_dummy175 {
local z : variable label `x'
replace country = "`z'" if `x'==1
}
Note that Stata does not treat "." as a missing string value. Your error was that if you do not specify that you want a literal string with "" then Stata will look for a variable with the name you specify. In your case, Afghanistan would be a legal variable name, but you have no such variable: hence the error. Countries with spaces in their names would be problematic for other reasons as well, but the command would almost always fail for the same reason.
This should work too:
gen country = ""
foreach x of varlist country_dummy1-country_dummy175 {
replace country = "`: variable label `x''" if `x'
}
You could slap quietly on the foreach to avoid 175 messages from the replace.

Related

How do I remove point symbol from the decimal number?

I'm trying to take decimal number as an input and I need output of all numbers but without the point symbol in it.
Example input: 123.4
Wanted output 1234
The problem I have that when converting decimal number into string and trying to remove "." using :gsub('%.', '') its removing the point symbol but outputs 1234 1 .
I have tried :gsub('.', '') as well but it outputs 5.
I'm clueless where those numbers come from, here is the screenshot:
Use this syntax to get what you want and discard/ignore what you dont need...
local y = 123.4
-- Remove decimal point or comma here
local str, matches = tostring(y):gsub('[.,]', '')
-- str holds the first return value
-- The second return value goes to: matches
-- So output only the string...
print(str) -- Output: 1234
-- Or/And return it...
return str
There are two issues at play here:
string.gsub returns two values, the resulting string and the number of substitutions. When you pass the results of gsub to print, both will be printed. Solve this by either assigning only the first return value to a variable (more explicit) or surrounding gsub with parenthesis.
. is a pattern item that matches any character. Removing all characters will leave you with the empty string; the number of substitutions - 5 in your example - will be the number of characters. To match the literal dot, either escape it using the percent sign (%.) or enclose it within a character set ([.]), possibly adding further decimal separators ([.,] as in koyaanisqatsi's answer).
Fixed code:
local y = 123.4
local str = tostring(y):gsub("%.", "") -- discards the number of substitutions
print(str)
this is unreliable however since tostring guarantees no particular output format; it might as well emit numbers in scientific notation (which it does for very large or very small numbers), causing your code to break. A more elegant solution to the problem of shifting the number such that it becomes an integer would be to multiply the number by 10 until the fractional part becomes zero:
local y = 123.4
while y % 1 ~= 0 do y = y * 10 end
print(y) -- note: y is the number 1234 rather than the string "1234" here

lua match everything after a tag in a string

The string is like this:
TEMPLATES="!$TEMPLATE templatename manufacturer model mode\n$TEMPLATE MacQuantum Wash Basic\n$$MANUFACTURER Martin\n$$MODELNAME Mac Quantum Wash\n$$MODENAME Basic\n"
My way to get strings without tags is:
local sentence=""
for word in string.gmatch(line,"%S+") do
if word ~= tag then
sentence=sentence .. word.." "
end
end
table.insert(tagValues, sentence)
E(tag .." --> "..sentence)
And I get output:
$$MANUFACTURER --> Martin
$$MODELNAME --> Mac Quantum Wash
...
...
But this is not the way I like.
I would like to find first the block starting with $TEMPLATE tag to check if this is the right block. There is many such blocks in a file I read line by line. Then I have to get all tags marked with double $: $$MODELNAME etc.
I have tried it on many ways, but none satisfied me. Perhaps someone has an idea how to solve it?
We are going to use Lua patterns (like regex, but different) inside a function string.gmatch, which creates a loop.
Explanation:
for match in string.gmatch(string, pattern) do print(match) end is an iterative function that will iterate over every instance of pattern in string. The pattern I will use is %$+%w+%s[^\n]+
%$+ - At least 1 literal $ ($ is a special character so it needs the % to escape), + means 1 or more. You could match for just one ("%$") if you only need the data of the tag but we want information on how many $ there are so we'll leave that in.
%w+ - match any alphanumeric character, as many as appear in a row.
%s - match a single space character
[^\n]+ - match anything that isn't '\n' (^ means invert), as many as appear in a row.
Once the function hits a \n, it executes the loop on the match and repeats the process.
That leaves us with strings like "$TEMPLATE templatename manufacturer"
We want to extract the $TEMPLATE to its own variable to verify it, so we use string.match(string, pattern) to just return the value found by the pattern in string.
OK: EDIT: Here's a comprehensive example that should provide everything you're looking for.
templates = "!$TEMPLATE templatename manufacturer model mode\n$TEMPLATE MacQuantum Wash Basic\n$$MANUFACTURER Martin\n$$MODELNAME Mac Quantum Wash\n$$MODENAME Basic\n"
local data = {}
for match in string.gmatch(templates, "%$+%w+%s[^\n]+") do --finds the pattern given in the variable 'templates'
--this function assigns certain data to tags inside table t, which goes inside data.
local t = {}
t.tag = string.match(match, '%w+') --the tag (stuff that comes between a $ and a space)
t.info = string.gsub(match, '%$+%w+%s', "") --value of the tag (stuff that comes after the `$TEMPLATE `. Explanation: %$+ one or more dollar signs $w+ one or more alphanumeric characters $s a space. Replace with "" (erase it)
_, t.ds = string.gsub(match, '%$', "") --This function emits two values, the first one is garbage and we don't need (hence a blank variable, _). The second is the number of $s in the string).
table.insert(data, t)
end
for _,tag in pairs(data) do --iterate over every table of data in data.
for key, value in pairs(tag) do
print("Key:", key, "Value:", value) --this will show you data examples (see output)
end
print("-------------")
end
print('--just print the stuff with two dollar signs')
for key, data in pairs(data) do
if data.ds == 2 then --'data' becomes a subtable in table 'data', we evaluate how many dollar signs it recognized.
print(data.tag)
end
end
print("--just print the MODELNAME tag's value")
for key, data in pairs(data) do
if data.tag == "MODELNAME" then --evaluate the tag name.
print(data.info)
end
end
Output:
Key: info Value: templatename manufacturer model mode
Key: ds Value: 1
Key: tag Value: TEMPLATE
-------------
Key: info Value: MacQuantum Wash Basic
Key: ds Value: 1
Key: tag Value: TEMPLATE
-------------
Key: info Value: Martin
Key: ds Value: 2
Key: tag Value: MANUFACTURER
-------------
Key: info Value: Mac Quantum Wash
Key: ds Value: 2
Key: tag Value: MODELNAME
-------------
Key: info Value: Basic
Key: ds Value: 2
Key: tag Value: MODENAME
-------------
--just print the stuff with two dollar signs
MANUFACTURER
MODELNAME
MODENAME
--just print the MODELNAME tag's value:
Mac Quantum Wash

How to test if a string character is a digit?

How can I test if a certain character of a string variable is a digit in SPSS (and then apply some operations, depending on the result)?
So let's for example say, I have a variable that reflects the street number. Some street numbers have additional character at the end e.g. "12b". Now let's further assume that I extracted the last character (that could be a digit, or the additional letter) into a string variable. After that I'd like to check if this character is a digit or a letter. How can this be done?
I managed to do this with the MAX function, where "mychar" is the character variable to be checked:
COMPUTE digitcheck = (MAX(mychar,"9")="9").
If the content of "mychar" is a digit [0-9] the result of the MAX function will be "9" otherwise the MAX function will return the letter and the equality test fails.
In this way you can also check if a whole string variable contains a letter or not. It looks pretty ugly though, because you have to compare every single character of your string variable.
compute justdigits = (MAX((CHAR.SUBSTR(mystr,1,1), CHAR.SUBSTR(mystr,2,1), CHAR.SUBSTR(mystr,3,1), ..., CHAR.SUBSTR(mystr,n,1),"9")="9").
If you try to turn a letter into a number then it becomes a missing value. Therefore, to test whether a character is a digit, you can do this:
if not missing(number(YourCharacter,f1)) .....
The same test can determine whether a string has only a number in it or not:
compute OnlyNumber=(not missing(number(YourString,f10))).
Note: using the number command on strings will produce warning messages which you can of course ignore.

splitting address with - using split() results in wierd 5 digits

kinda stumped with the split function, was wondering if someone can help me out.
I have a list of addresses where I'm trying to split the number and street name. These addresses have hypens in them so for example.
10-09 Main St
So i used =SPLIT(A1, " ") <- Column A has all the addresses.
The result i get is = 43017 Main St
I could use the menu tab Data >> Split text to columns but I'm trying to automate it using a script. Is there a way to force the split function to treat the data as text and not as a number?
Thank you in advance
This will work with these two user defined functions. Assuming your address is in A1.
function nbr(range) {
var addr = range.split(" ");
return addr[0];// just the nbr
}
function street(range) {
var addr = range.split(" ");
var array=[]
for(var i=1;i<addr.length;i++){
array.push(addr[i]) //create an array of split addr starting with second element
}
return array.toString().replace(/,/g," ")// convert array to string and replace all commas with soaces
}
In B1 put =nbr(A1) and in C1 put =street(A1)
Have you tried changing the column types to flat text? I was more-or-less able to replicate the behaviour when I set the column type to number, but when I changed the type to flat text, it behaved as expected.
Try Layout -> Number -> Flat text.
(Since I'm Dutch, the options might be named slightly different - apologies for that)

Xtext: enter wrong rule or 'missing RULE_* at'

I want to parse all names from a random text. Names will be formatted like this:
Lastname F.
where F - first letter of first name. So, I created this grammar:
grammar org.xtext.example.mydsl.Article with org.eclipse.xtext.common.Terminals
generate article "http://www.xtext.org/example/mydsl/Article"
Model : {Model}((per += Person)|(words += NON_WS))*;
Person : lastName = NAME firstName = IN;
terminal NAME : ('A'..'Z')('a'..'z')+;
terminal IN : ('A'..'Z')'.';
terminal NON_WS : !(' '|'\t'|'\r'|'\n')+;
It works on this example:
Lastname F. some text. Lastname F.
But it crashes on this one:
Lastname F. some text. New sentence. Lastname F.
^^^^^^^^^ missing RULE_IN at 'sentence.'
How do I include a checking of all tokens before the generation of the 'Person' object or before the entering the 'Person' rule?
lexing is done kontext free. thus one lexed a name, always lexed a name
Model : {Model}((per += Person)|(words += (NON_WS|NAME)))*;

Resources