gawk: presenting two operations outcome in two rows

gawk: presenting two operations outcome in two rows - printing

I have a program which output is summary file with header and few columns of results.
I want to show only two data: file name and best period prediction and I use this command:
program input_file | gawk 'NR==2 {print $3}; NR==4 {print $2}'
as the result I obtain result in one column, two lines. What I have to do to have this result in one line, two columns?

You could use:
program input_file | gawk 'NR==2 {heading = $3}; NR==4 {print heading " = " $2}'
This saves the value in $3 on line 2 in variable heading and prints the heading and the value from column 2 when it reads line 4.

Related

Extracting lines from a fixed format without spaces file based on a column and list of inquiring IDs

I have a quite large fixed format file without spaces (file1):
file1:
0808563800555550000367120000500000
0005555566369330000078020000500000
01066666780000000008933600009000005635
0904251263088000000786590056500000
0000469011009904440425120444444440
I want to extract lines with fields 4-8,11-15 and 20-24 when fields 4-8 (only) are in a list of IDs in file2
file2:
55555
42512
The desired outputs are:
55555 36933 07802
42512 08800 78659
I have tried the following combination of cut | grep commands:
cut -c 4-8,11-15,20-24 file1 --output-delimiter=' ' | grep -w -F -f file2
It works fine and the speed is very good, but the problem is that I am getting columns where the lookup ID (fields 4-8) is not in the first column of the cutted data, and that is because grep checks the three columns after cut, not only the first one. 
Here are the outputs of the command above:
85638 55555 36712
55555 36933 07802
66666 00000 89336
42512 08800 78659
04690 00990 42512
I know one may write the output to a file and then use, for example awk, but I thought there could be a much simpler approach to avoid longer processing time (for example, makes grep picks only the match in a specific cutted column).
Any help will be very appreciated and many thanks!

With GNU awk for FIELDWIDTHS:
$ awk -v FIELDWIDTHS='3 5 2 5 4 5 *' 'NR==FNR{a[$0]; next} $2 in a{ print $2, $4, $6 }' file2 file1
55555 36933 07802
42512 08800 78659

Would you please try the following:
cut -c 4-8,11-15,20-24 file1 --output-delimiter=' ' | grep -wf <(sed 's/^/^/' file2)
Each line in file2 is prepended by a caret ^ character to anchor to
the start of the line of the output by cut.
It may be a bit slower than before due to the lack of -F option.

Find matching words

I have a corpus file and the rules file. I am trying to find matching words where the word from rule appear in corpus.
# cat corpus.txt
this is a paragraph number one
second line
third line
# cat rule.txt
a
b
c
This returns 2 lines
# grep -F0 -f rule.txt corpus.txt
this is a paragraph number one
second line
But I am expecting 4 words like this...
a
paragraph
number
second
Trying to achive these results using grep or awk.

Assuming words are seperated by white spaces
awk '{print "\\S*" $1 "\\S*"}' rule.txt | grep -m 4 -o -f - corpus.txt

Join multiple lines into One (.cap file) CentOS

Single entry has multiple lines. Each entry is separated by two blank lines.
Each entry has to be made into a single line followed by a delimiter(;).
Sample Input:
Name:Sid
ID:123
Name:Jai
ID:234
Name:Arun
ID:12
Tried replacing the blank lines with cat test.cap | tr -s [:space:] ';'
Output:
Name:Sid;ID:123;Name:Jai;ID:234;Name:Arun;ID:12;
Expected Output:
Name:SidID:123;Name:JaiID:234;Name:ArunID:12;
Same is the case with Xargs.
I've used sed command as well but it only joined two lines into one. Where as I've 132 lines as one entry and 1000 such entries in one file.

You may use
cat file | awk 'BEGIN { FS = "\n"; RS = "\n\n"; ORS=";" } { gsub(/\n/, "", $0); print }' | sed 's/;;*$//' > output.file
Output:
Name:SidID:123;Name:JaiID:234;Name:ArunID:12
Notes:
FS = "\n" will set field separators to a newline`
RS = "\n\n" will set your record separators to double newline
gsub(/\n/, "", $0) will remove all newlines from a found record
sed 's/;;*$//' will remove the trailing ; added by awk
See the online demo

Could you please try following.
awk 'NF{val=(val?$0~/^ID/?val $0";":val $0:$0)} END{print val}' Input_file
Output will be as follows.
Name:SidID:123;Name:JaiID:234;Name:ArunID:12;
Explanation: Adding explanation of above code too now.
awk ' ##Starting awk program here.
NF{ ##Checking condition if a LINE is NOT NULL and having some value in it.
val=(val?$0~/^ID/?val $0";":val $0:$0) ##Creating a variable val here whose value is concatenating its own value along with check if a line starts with string ID then add a semi colon at last else no need to add it then.
}
END{ ##Starting END section of awk here.
print val ##Printing value of variable val here.
}
' Input_file ##Mentioning Input_file name here.

This might work for you (GNU sed):
sed -r '/./{N;s/\n//;H};$!d;x;s/.//;s/\n|$/;/g' file
If it is not a blank line, append the following line and remove the newline between them. Append the result to the hold space and if it is not the end of the file, delete the current line. At the end of the file, swap to the hold space, remove the first character (which will be a newline) and then replace all newlines (append an extra semi-colon for the last line only) with semi-colons.

reading semi-formatted data

I'm totally new to AWK, however I think this is the best way to solve my problem and a good time to learn AWK.
I am trying to read a large data file that is created by a simulation program. The output is made to be readable by humans, so its formatting isn't very consistent. An example of the output is in this image
http://i.imgur.com/0kf8l.png
I need a way to find a line like "He 2 4686A -2.088 0.0071", by specifying the "He 2 4686A" part and get the following two numbers. The problem is the line "He 2 4686A -2.088 0.0071" can appear anywhere in the table.
I know how to find the entry "He 2 4686A", but I don't know which of the 4 columns it's in. So I don't know how to address the values that follow it.
A command that lets me just read the next two words, or tells me the location of the pattern once a match is found will both help.
/He 2 4686A/ finds the line
Ca A 3970A -0.900 0.1100 He 2 4686A -2.088 0.0071 S 3 18.67m -0.371 0.3721 Ar 4 444.7A -2.124 0.0066
Any help is appreciated.

First step should be to bring what seems to be 4 columns of records into a 1-column format...then its easy with awk because you can then filter for the first 5 fields - like:
echo "He 2 4686A -2.088 0.0071" | \
awk '$1 == "He" && $2 == 2 && $3 == "4686A" {print $4, $5}'
which gives
-2.088 0.0071
So, for me, the only challenge is to transform your data to one-column format...And from the picture that look simple because it seems that the columns have a fixed length which you can count.
Assuming that your column-width is 30 characters (difficult to tell from a picture, beware of tabs) and you data is in input_file, then you could first "cut" the data into 4 columns and then pipe the output to another awk-process
awk '{
print substr($0,1,30)
print substr($0,31,30)
print substr($0,61,30)
print substr($0,91,30)
}' input_file | \
awk '$1 == "He" && $2 == 2 && $3 == "4686A" {print $4, $5}'
If you really just need the next two numbers behind an anchor then I would say the grep-solution from Costa is best for you, however this gives you the possibility to implement further logic...

If you're not dead set on using awk, grep would be the easiest way...
egrep -o "He 2 4686A \-?[0-9.]+ \-?[0-9.]+" output.txt
EDIT: The above would work only if the spacing was done with a whitespace, which doesn't seem to be your case. In order to handle tabs and/or repeating whitespaces...
egrep -o "He[ \t]+2[ \t]+4686A[ \t]+\-?[0-9.]+[ \t]+\-?[0-9.]+" output.txt

Use grep -A1 to return a value in the second line as long as a numeric value in the first line is met

I have log entries that are paired two lines each. I have to parse the first line to extract
a number to know if it is greater than 5000. If this number is greater than 5000 then I need to return the second line, which will also be parsed to retrieve an ID.
I know how to grep all of the info and to parse it. I don't know how to make the grep ignore
things if they are less than a particular value. Note that I am not committed to using grep if some
other means like awk/sed can be substituted.
Raw Data (two lines separated for example clarity).
The target of my grep is the number 5001
following "credits extracted = ", if this is over 5000 then I want to return number "12345" from
the second line --------------------------
2012-03-16T23:26:12.082358 0x214d000 DEBUG ClientExtractAttachmentsPlayerMailTask for envelope 22334455 finished: credits extracted = 5001, items extracted count = 0, status = 0. [Mail.heomega.mail.Mail](PlayerMailTasks.cpp:OnExtractAttachmentsResponse:944)
2012-03-16T23:26:12.082384 0x214d000 DEBUG Mail Cache found cached mailbox for: 12345 [Mail.heomega.mail.Mail](MailCache.cpp:GetCachedMailbox:772)
Snippits --------------------------
-- Find the number of credits extracted, without the comma noise:
grep "credits extracted = " fileName.log | awk '{print $12}' | awk -F',' '{print $1}'
-- Find the second line's ID no matter what the value of credits extracted is:
grep -A1 "credits extracted = " fileName.log | grep "cached mailbox for" | awk -F, '{print $1}' | awk '{print $10}'
-- An 'if' statement symbolizing the logic I need to acquire:
v_CredExtr=5001; v_ID=12345; if [ $v_Cred -gt 5000 ]; then echo $v_ID; fi;

You can do everything with a single AWK filter I believe:
#!/usr/bin/awk -f
/credits extracted =/ {
credits = substr($12, 1, length($12) - 1) + 0
if (credits > 5000)
show_id = 1
next
}
show_id == 1 {
print $10
show_id = 0
}
Obviously, you can stuff all the AWK script in a shell string inside a script, even multiline. I showed it here in its own script for clarity.
P.S: Please notify when it works ;-)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

gawk: presenting two operations outcome in two rows - printing

You could use: program input_file | gawk 'NR==2 {heading = $3}; NR==4 {print heading " = " $2}' This saves the value in $3 on line 2 in variable heading and prints the heading and the value from column 2 when it reads line 4.

Related

Extracting lines from a fixed format without spaces file based on a column and list of inquiring IDs

Find matching words

Join multiple lines into One (.cap file) CentOS

reading semi-formatted data

Use grep -A1 to return a value in the second line as long as a numeric value in the first line is met

Categories

Resources