Outsheet to quote and comma delimited text - delimiter

My variables in Stata are of the form:
First Name: Allen
Last Name: Von Schmidt
Birth Year: 1965
County: Cape May
State: New Jersey
First Name: Lee Roy
Last Name: McBride
Birth Year: 1967
County: Cook
State: Illinois
I would like to outsheet them to create quote and comma separated rows in a .txt as:
"Allen,"Von Schmidt","1965","Cape May","New Jersey"
"Lee Roy","McBride","1967","Cook","Illinois"
How can I use outsheet (or another command) to do this? Do I need to make the numerics into strings first? Do I need to add a commas to each variable first?
I have tried the following:
outsheet first last birth_year county state using FileName.txt, nolabel delim(",")
This seems to work ok except that it does not put the numeric variables inside "".

I don't understand why you want this, but Stata's practice here as elsewhere is that only strings are placed in double quotes. So, to output numeric variables as if they were strings you do need to convert them first to string variables. The tostring command is designed for this.
But this is an awkward thing to do, and on the whole a bad idea.
First, and easier: if you use tostring you change your data, and numeric operations become impossible on the new string variables. That is relatively easy to work around. Just make sure you save your data first before using tostring and then read it back in again after exporting the data. Or use preserve followed by restore.
Second, and more problematic: you need to worry about loss of detail for any numeric variables that are not integer. tostring does have options that help here, but there are no guarantees of keeping every bit unless you get into nightmare territory of exporting hexadecimal. That's true of outsheet any way, but a warning should do no harm.
I am aware of the history of tostring, as its original author. I'll put on record that although it is a solution for what you appear to want to do, there are pitfalls as above and I don't recommend this way of working.
It would be better to explain why you think you need to do this. outsheet's export of numerics and strings seems to have worked well for export to other software, not least spreadsheets, over many uses.
P.S. as emphasised elsewhere, Stata does not regard " " as separators. They are delimiters for strings, but not separators for fields (or words in Stata's sense).

Related

Parsing a date token

I would like to parse various SQL literals in ANTLR. Examples of the literal would be:
DATE '2020-01-01'
DATE '1992-11-23'
DATE '2014-01-01'
Would it be better to do the 'bare minimum' at the parsing stage and just put in something like:
date_literal
: 'DATE' STRING
;
Or, should I be doing any validation within the parser as well, for example something like:
date_literal
: 'DATE' DIG DIG DIG DIG '-' DIG DIG '-' DIG DIG
If I do the latter I'll still need to validate...even if I do a longer regex, I'll need to check things like the number of days in the month, leap years, a valid date range, etc.
What is usually the preferable method to do this? As in, how much 'validation' do you want to do in your grammar and how much in the listeners where the actual programming would be done? Additionally, are there any performance differences between doing (small) validations 'within the grammar' vs doing it in listeners/followers?
These are actually two slightly different syntaxes (the second does not specify that the date should be surrounded by 's)
Based on your example, that may be an oversight, so I'll assume you mean both to require the 's, and that your STRINGs are ' delimited.
It's a design choice, but a couple of factors to consider.
If you use the more specific grammar, then, if the user input doesn't match, you'll get the default ANTLR error message (which will be "pretty good for a generated tool", but probably a bit obtuse to your user).
As you say, you'll still have to perform further edits.
I lean toward keeping the grammar as simple as possible and doing more validation in a listener (maybe a visitor). This allows you to be as clear with your error messages as possible.
The only reason I see to not use the 'DATE' STRING rule would be if there is some other string content that would NOT be a date_literal, but would be some other, valid syntax in your language. It might be an invalid date literal, in which case, I'd use your simple rules and do the edit.

Check values existence using spss syntax

I should check existence of values based on some conditions.
i.e. i have 3 variables, varA, varB and varC. varC should not be empty only if varA>varB (condition).
i normally use some syntax to check any of the variables and run a frequency of any of them to see if there are errors:
if missing(varC) and (varA>varB) ck_varC=1.
if not(missing(varC)) and not(varA>varB) ck_varC=2.
exe.
fre ck_varC.
exe.
I had some errors when the condition became complex and when in the condition there are missing() or other functions but i could have made a mistake.
do you think there is an easier way of doing this checks?
thanks in advance
EDIT: here an example of what i mean, think at a questionnaire with some routing, you ask age to anyone, if they are between 17 and 44 ask them if they work, if they work ask them how many hours.
i have an excel tool where i put down all variables with all conditions, then it will generate the syntax in the example, all with the same structure for all variables, considering both situations, we have a value that shouldn't be there or we don't have a value that should be there.
is there an easier way of doing that? is this structure always valid no matter what is the condition?
In SPSS, missing values are not numbers. You need to explicitly program those scenarios as well. you got varC covered (partially), but no scenario where varA or varB have missing data is covered.
(As good practice, maybe you should initialize your check variable as sysmis or 0, using syntax):
numeric ck_varC (f1.0).
compute ck_varC=0.
if missing(varC) and (varA>varB) ck_varC=1.
if not(missing(varC)) and not(varA>varB) ck_varC=2.
***additional conditional scenarios go here:.
if missing(varA) or missing(varB) ck_varC=3.
...
fre ck_varC.
By the way - you do not need any of the exe. commands if you are going to run your syntax as a whole.
Later Edit, after the poster updated the question:
Your syntax would be something like this. Note the use of the range function, which is not mandatory, but might be useful for you in the future.
I am also assuming that work is a string variable, so its values need to be referenced using quotation signs.
if missing(age) ck_age=1.
if missing(work) and range(age,17,44) ck_work=1.
if missing(hours) and work="yes" ck_hours=1.
if not (missing (age)) and not(1>0) ck_age=2. /*this will never happen because of the not(1>0).
if not(missing(work)) and (not range(age,17,44)) ck_work=2. /*note that if age is missing, this ck_work won't be set here.
if not(missing(hours)) and (not(work="yes")) ck_hours=2.
EXECUTE.
String variables are case sensitive
There is no missing equivalent in strings; an empty blank string ("") is still a string. not(work="yes") is True when work is blank ("").

Delphi - create Title/Proper/Mixed Case for Strings

I have a list of approx 100,000 names I need to process. Some are business names, some are people names. Unfortunately, some are lower, some are upper, and some are mixed. I am looking for a routine to convert them to proper case. (Sometimes called Mixed or Title case). I realize I can just loop through the string and capitalize every character that starts a new word. That would be an incredibly simplistic approach. For businesses, short words should be lowercase (of, with, for, ...). For last names, if it starts with Mc, the 3rd letter should be capitalized (McDermot, McDonald, etc). Roman numerals should always be capitalized (John Smith II ), etc.
I have not been able to find any Delphi built in, or otherwise, routines. Surely this is out there. Where can I find this?
Thanks
As it was already said by others, making a fully automated routine for this is nearly impossible due to so many special variations. So leaving out the human interaction completely is almost impossible.
Now what you can do instead is to make this much easier for human to solve. How? Make a dictionary of all the name variations in Lowercase and present it to him.
Before presenting the names you can make sure that the first letter in any of the names is already capitalized.
Once all name correction has been made in dictionary you go and automatically replace all the names in original database.

is it ever appropriate to localize a single ascii character

When would it be appropriate to localize a single ascii character?
for instance /, or | ?
is it ever necessary to add these "strings" to the localization effort?
just want to give some people the benefit of the doubt and make sure there's not something I didn't think of.
Generally it wouldn't be appropriate to use something like that except as a graphic element (which of course wouldn't be I18N'd in the first place, much less L10N'd). If you are trying to use it to e.g. indicate a ratio then you should have something like "%d / %d" instead, and localize the whole thing.
Yes, there are cases where these individual characters change in localization. This is not a comprehensive list, just examples I happen to know.
Not every locale uses , to separate thousands and . for the decimal. (However, these will usually be handled by your number formatter. If you do so yourself, you're probably doing it wrong. See this MSDN blog post by Michael Kaplan, Number format and currency format are not always the same.)
Not every language uses the same quotation marks (“, ”, ‘ and ’). See Wikipedia on Non-English Uses of Quotation Marks. (Many of these are only easy to replace if you use full quote marks. If you use the " and ' on your keyboard to mark both the start and end of sentences, you won't know which of two symbols to substitute.)
In Spanish, a question or exclamation is preceded by an inverted ? or !. ¿Question? ¡Exclamation! (Obviously, you can't fix this with a locale substitution for a single character. Any questions or exclamations in your application should be entire strings anyway, unless you're writing some stunningly intelligent natural language generator.)
If you do find a circumstance where you need to localize these symbols, be extra cautious not to accidentally localize a symbol like / used as a file separator, " to denote a string literal or ? for a search wildcard.
However, this has already happened with CSV files. These may be separated by ,, or may be separated by the local list separator. See What would happen if you defined your system's CSV delimiter as being a quotation mark?
In Greek, questions end with a semicolon rather than ?, so essentially the ? is replaced with ; ... however, you should aim to always translate the question as a complete string including question mark anyway.

LaTeX: Extract numbers and letters from command variables and conversion to roman numerals

I'm writing a thesis and have been searching on many occasions trying to find a solution to my programming problem. Basically, I have a series of items that I've distinguished in my research data as "A1", "A2", "A3", … , "A13", "B1", B2", and so on. These data labels, by the way, I can't change now because it's been used throughout my thesis. They are always formatted as [caps-letter][digit(length of 1 to 2 chars)], e.g., X20 or L9. For each data item, I want to assign a specific name. Because LaTeX doesn't allow numbers in the command, I have already created a LONG list of the following types of commands to assign names to each data label:
\newcommand{\DataNameAi}[]%
{Data name for A1}
\newcommand{\DataNameAii}[]%
{Data name for A2}
% …
\newcommand{\DataNameXxi}[]%
{Data name for X11}
% …
and so on. Basically, as you can see I've named the command as "\DataName" followed by the letter (in caps), followed by the number written out as roman numerals. This was all manually done, and I did this only because LaTeX didn't seem to like any arabic numbers in the command name. If it permitted this, I would have used \DataNameA1 etc.
Elsewhere, I also have a command to reference the data specifically:
\newcommand{\GotoData}[1]%
{\hyperref[data#1]{Data~#1}}
See data at \Gotodata{E10} % this links to another location labelled \label{dataE10}
Now, I want to now assign a latex command that can take just one variable, the data label (whether it's "Q30" and "A3"), and use the \GotoData command as well as bring up the corresponding data name in the \DataName*** command. That is, type \CompleteData{E10}, for example, and then have LaTeX load something like:
"This is [Data E10] named [Data name for E10]."
This means the command might look something like:
\newcommand{\CompleteData}[1]%
{This is [\GotoData{#1}] named [\DataNameEx].}
\CompleteData{E10} % <--- this should look like "This is [Data E10] named [Data name for E10]."
As you can see, the code above is incomplete: I'm stuck with how to use the #1 variable to generate the necessary \DataName*** command within the \CompleteData newcommand.
So very basically, I see the only way as achieving this result is to have the code extract and convert the last number (one to two digits long) into a roman numeral. Specifically, I've been trying to figure out how to do a few things:
how to extract just the digits from the end of a parameter in a newcommand (such as the digits in my "Q31" or "A1" parameters).
similarly extract the letter from the first character of the parameter
how to convert numbers to numerals
I've tried searching in many different ways but never seem to find what I need to answer these two questions … I thought I was close when I found this site but later realised it's not what I'm after. The etextools LaTeX package also looked promising but I'm too novice (not even a programmer) to make much sense out of the help PDF that comes with my TexLive (2010) installation. I've also read about \roman and \romannumeral (e.g., here) but those two commands cause errors when I compile for some reason. On my computer, \roman{2} becomes "roman" while \romannumeral{2} becomes "2". Just don't understand how they work.
Any guidance, demonstration code, or hints would be greatly appreciated! Thank you.
Here's an example that works for me:
\documentclass{article}
\usepackage{hyperref}
\newcommand{\DataNameAii}{Data name for A2}
\newcommand{\GotoData}[1]{\hyperref[data#1]{Data~#1}}
\newcommand{\CompleteData}[1]{This is [\GotoData{#1}] named [\FormatDataName#1$].}
\newcounter{DataNumber}
\def\FormatDataName#1#2${\setcounter{DataNumber}{#2}\csname DataName#1\roman{DataNumber}\endcsname}
\begin{document}
\section{Data A2}\label{dataA2}
\CompleteData{A2}
\end{document}
\FormaDataName extracts the first character into #1, and the number into #2. It does that using the fact that \FormatDataName takes a delimited argument (delimited by a final $). After that, it's just a case of constructing the macro name you want to call (using \csname), and using \roman to format the number as roman numerals. (I think the reason you couldn't get this to work is because you weren't passing \roman a counter).

Resources