AWK print format problem - printing

I have a file with data
number name age sex
102234 James_Mckenzie 21 M
102233 Jim_Reil 24 M
102235 Alan_Lightbrighter 19 M
...
and I am trying to print them out in such form
number :
name : age :
sex :
so basically, the printout will be like this,
number : 102233
name : Jim_Reil age : 24
sex : M
number : 102235
name : Alan_Lightbrighter age : 19
sex : M
...
The problem is I am trying to keep up with spacing between name and age, but due to the variable name length, the position of 'age' is not where I am want(I used /t for spacing)
What would be the best way to fix this issue?
I am sorry if this has been asked thousand times already.(I spend some time searching but I guess my search keyword sucked :( )
Thanks

Don't use tabs for spacing since that relies entirely on the vagaries of your terminal.
Figure out what the largest name is likely to be and use printf with a %-50s format specifier (50 characters wide, left justified) on your name, as per the following transcript:
$ echo '1 pax X
2 paxdiablo_with_a_very_long_name Y' | awk '{printf "%3d %-40s %s\n",$1,$2,$3}'
1 pax X
2 paxdiablo_with_a_very_long_name Y
Or, closer to your requirements:
$ echo '102234 James_Mckenzie 21 M
102233 Jim_Reil 24 M
102235 Alan_Lightbrighter 19 M' | awk '
{printf "number : %d\nname : %-20s age : %d\nsex : %s\n",$1,$2,$3,$4}'
number : 102234
name : James_Mckenzie age : 21
sex : M
number : 102233
name : Jim_Reil age : 24
sex : M
number : 102235
name : Alan_Lightbrighter age : 19
sex : M
Obviously, the 50 and 20 are examples - you should choose a size that suits your needs. If necessary, you could even go through the file on the first pass to work out the largest name and then use a format string constructed from that in a second pass to print, but that's probably overkill.

Related

ANTLR4 and chess UCI parser problems with simple grammar

I want to define simple grammar for UCI protocol but I have some problems with it.
I want parse such line:
id name hello world
Nothing complex I wrote such grammar:
grammar s01_uci;
id_name
: 'id' SPACE 'name' SPACE ID_NAME
;
SPACE
: ' '
;
ID_NAME
: .+?
;
LINE_END
: '\r\n'
| '\r'
| '\n'
;
My Pycharm Antlr4 plugin shows that only 'id name h' is parsed rest is not visible to parser. Why it happens - wrong lexer rules?
It look simple that I should match token but maybe I define too much token or not enough in Lexer - I have not idea how to improve it after reading many articles.
Some example of lines to be parsed (input is lines not file):
id name Stockfish 10 64 BMI2
id author T. Romstad, M. Costalba, J. Kiiski, G. Linscott
option name Debug Log File type string default
option name Contempt type spin default 24 min -100 max 100
option name Analysis Contempt type combo default Both var Off var White var Black var Both
option name Threads type spin default 1 min 1 max 512
option name Hash type spin default 16 min 1 max 131072
option name Clear Hash type button
option name Ponder type check default false
option name MultiPV type spin default 1 min 1 max 500
option name Skill Level type spin default 20 min 0 max 20
option name Move Overhead type spin default 30 min 0 max 5000
option name Minimum Thinking Time type spin default 20 min 0 max 5000
option name Slow Mover type spin default 84 min 10 max 1000
option name nodestime type spin default 0 min 0 max 10000
option name UCI_Chess960 type check default false
option name UCI_AnalyseMode type check default false
option name SyzygyPath type string default <empty>
option name SyzygyProbeDepth type spin default 1 min 1 max 100
option name Syzygy50MoveRule type check default true
option name SyzygyProbeLimit type spin default 7 min 0 max 7
uciok

How to count locals in ANS-Forth?

While developing BigZ, mostly used for number theoretical experiments, I've discovered the need of orthogonality in the word-set that create, filter or transform sets. I want a few words that logically combinated cover a wide range of commands, without the need to memorize a large number of words and ways to combinate them.
1 100 condition isprime create-set
put the set of all prime numbers between 1 and 100 on a set stack, while
function 1+ transform-set
transform this set to the set of all numbers p+1, where p is a prime less than 100.
Further,
condition sqr filter-set
leaves the set of all perfect squares on the form p+1 on the stack.
This works rather nice for sets of natural numbers, but to be able to create, filter and transform sets of n-tuples I need to be able to count locals in unnamed words. I have redesigned words to shortly denote compound conditions and functions:
: ~ :noname ;
: :| postpone locals| ; immediate
1 100 ~ :| p | p is prime p 2 + isprime p 2 - isprime or and ;
1 100 ~ :| a b | a dup * b dup * + isprime ;
Executing this two examples gives the parameter stack ( 1 100 xt ) but to be able to handle this right, in the first case a set of numbers and in the second case a set of pairs should be produced, I'll have to complement the word :| to get ( 1 100 xt n ) where n is the numbet of locals used. I think one could use >IN and PARSE to do this, but it was a long time ago I did such things, so I doubt I can do it properly nowadays.
I didn't understand (LOCALS) but with patience and luck I managed to do it with my original idea:
: bl# \ ad n -- m
over + swap 0 -rot
do i c# bl = +
loop negate ;
\ count the number of blanks in the string ad n
variable loc#
: locals# \ --
>in # >r
[char] | parse bl# loc# !
r> >in ! ; immediate
\ count the number of locals while loading
: -| \ --
postpone locals#
postpone locals| ; immediate
\ replace LOCALS|
Now
: test -| a b | a b + ;
works as LOCALS| but leave the number of locals in the global variable loc#.
Maybe you should drop LOCALS| and parse the local variables yourself. For each one, call (LOCAL) with its name, and end with passing an empty string.
See http://lars.nocrew.org/dpans/dpans13.htm#13.6.1.0086 for details.

How to use foreach / forv to replace duplicates in an increasing order

I have a "raw" data set that I´m trying to clean. The data set consists of individuals with the variable age between year 2000 and 2010. There are around 20000 individuals in the data set with the same problem.
The variable age is not increasing in the years 2004-2006. For example, for one individual it looks like this:
2000: 16,
2001: 17,
2002: 18,
2003: 19,
2004: 19,
2005: 19,
2006: 19,
2007: 23,
2008: 24,
2009: 25,
2010: 26,
So far I have tried to generate variables for the max age and max year:
bysort id: egen last_year=max(year)
bysort id: egen last_age=max(age)
and then use foreach combined with lags to try to replace age variable in decreasing order so that when the new variable last_age (that now are 26 in all years) rather looks like this:
2010: 26
2009: 25 (26-1)
2008: 24 (26-2) , and so on.
However, I have some problem with finding the correct code for this problem.
Assuming that for each individual the first value of age is not missing and is correct, something like this might work
bysort id (year): replace age = age[1]+(year-year[1])
Alternatively, if the last value of age is assumed to always be accurate,
bysort id (year): replace age = age[_N]-(year[_N]-year)
Or, just fix the ages where there is no observation-to-observation change in age
bysort id (year): replace age = age[_n-1]+(year-year[_n-1]) if _n>1 & age==age[_n-1]
In the absence of sample data none of these have been tested.
William's code is very much to the point, but a few extra remarks won't fit easily into a comment.
Suppose we have age already and generate two other estimates going forward and backward as he suggests:
bysort id (year): gen age2 = age[1] + (year - year[1])
bysort id (year): gen age3 = age[_N] - (year[_N] - year)
Now if all three agree, we are good, and if two out of three agree, we will probably use the majority vote. Either way, that is the median; the median will be, for 3 values, the sum MINUS the minimum MINUS the maximum.
gen median = (age + age2 + age3) - max(age, age2, age3) - min(age, age2, age3)
If we get three different estimates, we should look more carefully.
edit age* if max(age, age2, age3) > median & median > min(age, age2, age3)
A final test is whether medians increase in the same way as years:
bysort id (year) : assert (median - median[_n-1]) == (year - year[_n-1]) if _n > 1

Creating numbered variable names using the foreach command

I have a list of variables for which I want to create a list of numbered variables. The intent is to use these with the reshape command to create a stacked data set. How do I keep them in order? For instance, with this code
local ct = 1
foreach x in q61 q77 q99 q121 q143 q165 q187 q209 q231 q253 q275 q297 q306 q315 q324 q333 q342 q351 q360 q369 q378 q387 q396 q405 q414 q423 {
gen runs`ct' = `x'
local ct = `ct' + 1
}
when I use the reshape command it generates an order as
runs1 runs10 runs11 ... runs2 runs22 ...
rather than the desired
runs01 runs02 runs03 ... runs26
Preserving the order is necessary in this analysis. I'm trying to add a leading zero to all ct values less than 10 when assigning variable names.
Generating a series of identifiers with leading zeros is a documented and solved problem: see e.g. here.
local j = 1
foreach v in q61 q77 q99 q121 q143 q165 q187 q209 q231 q253 q275 q297 q306 q315 q324 q333 q342 q351 q360 q369 q378 q387 q396 q405 q414 q423 {
local J : di %02.0f `j'
rename `v' runs`J'
local ++j
}
Note that I used rename rather than generate. If you are going to reshape the variables afterwards, the labour of copying the contents is unnecessary. Indeed the default float type for numeric variables used by generate could in some circumstances result in loss of precision.
I note that there may also be a solution with rename groups.
All that said, it's hard to follow your complaint about what reshape does (or does not) do. If you have a series of variables like runs* the most obvious reshape is a reshape long and for example
clear
set obs 1
gen id = _n
foreach v in q61 q77 q99 q121 q143 {
gen `v' = 42
}
reshape long q, i(id) j(which)
list
+-----------------+
| id which q |
|-----------------|
1. | 1 61 42 |
2. | 1 77 42 |
3. | 1 99 42 |
4. | 1 121 42 |
5. | 1 143 42 |
+-----------------+
works fine for me; the column order information is preserved and no use of rename was needed at all. If I want to map the suffixes to 1 up, I can just use egen, group().
So, that's hard to discuss without a reproducible example. See
https://stackoverflow.com/help/mcve for how to post good code examples.

Obtaining the quantity and proportion in SPSS 21

I have the data in a sav file
CODE | QUANTITY
------|----------
A | 1
B | 4
C | 1
F | 3
B | 3
D | 12
D | 5
I need to obtain the quantity of codes which have a quantity <= 3 and to obtain the proportion in a percentage with respect to the total number and present a result like this
<= 3 | PERCENTAGE
------|----------
4 | 57 %
All of this using SPSS syntax.
I would first convert the quantity value to a 0-1 variable, and then aggregate by code to the mean. This produces a nice second dataset to make a table. Example below.
data list free / Code (A1) Quantity (F2.0).
begin data
A 1
B 4
C 1
F 3
B 3
D 12
D 5
end data.
*convert to 0-1.
compute QuantityB3 = (Quantity LE 3).
*Aggregate.
DATASET DECLARE AggQuant.
AGGREGATE
/OUTFILE='AggQuant'
/BREAK=Code
/QuantityB3 = MEAN(QuantityB3).
I dont know how you migrate your question here, I dont have reputation here to add screen shoots that's help you allot. Anyhow the procedure of your desire output is given below.
Goto Transform->Count Values within cases a dialogue box open, write the name of new variable say "New" in Target Variable: go to define values a new dialogue box is open then check the radio button Range, LOWEST through value: put in below box 3 and then press add and press continue and press ok. A new variable is created with the name of "New". Now go to Analyze -> Descriptive Statistics-> Frequencies, new dialogue box will be open send "New" variable into Variable(s): press Statistics in new dialogue box check Percentile(s): write 100 in box and press Add and then continue and ok. You get the desire results.

Resources