Related
I have two text files containing one column each, for example -
File_A File_B
1 1
2 2
3 8
If I do grep -f File_A File_B > File_C, I get File_C containing 1 and 2. I want to know how to use grep -v on two files so that I can get the non-matching values, 3 and 8 in the above example.
Thanks.
You can also use comm if it allows empty output delimiter
$ # -3 means suppress lines common to both input files
$ # by default, tab character appears before lines from second file
$ comm -3 f1 f2
3
8
$ # change it to empty string
$ comm -3 --output-delimiter='' f1 f2
3
8
Note: comm requires sorted input, so use comm -3 --output-delimiter='' <(sort f1) <(sort f2) if they are not already sorted
You can also pass common lines got from grep as input to grep -v. Tested with GNU grep, some version might not support all these options
$ grep -Fxf f1 f2 | grep -hxvFf- f1 f2
3
8
-F option to match strings literally, not as regex
-x option to match whole lines only
-h to suppress file name prefix
f- to accept stdin instead of file input
awk 'NR==FNR{a[$0]=$0;next} !($0 in a) {print a[(FNR)], $0}' f1 f2
3 8
To Understand the meaning of NR and FNR check below output of their print.
awk '{print NR,FNR}' f1 f2
1 1
2 2
3 3
4 4
5 1
6 2
7 3
8 4
Condition NR==FNR is used to extract the data from first file as both NR and FNR would be same for first file only.
With GNU diff command (to compare files line by line):
diff --suppress-common-lines -y f1 f2 | column -t
The output (left column contain lines from f1, right column - from f2):
3 | 8
-y, --side-by-side - output in two columns
I need to merge 2 lists based on column 1 and 2
file1:
client1,server1,3000.00
client1,server2,2500.00
client1,server3,1500.00
client2,server1,4500.00
client2,server2,2300.00
client2,server3,1230.00
client3,server1,3400.00
client3,server2,4500.00
client3,server3,1245.00
client4,server1,3400.00
client5,server2,4500.00
client6,server3,1245.00
client7,server1,3400.00
client7,server2,4500.00
client8,server3,1245.00
client8,server1,3400.00
client8,server2,4500.00
client9,server3,1245.00
file2:
client1,server1,windows,250g
client1,server2,linux,450g
client1,server3,linux,400g
client2,server1,windows,250g
client2,server2,linux,450g
client2,server3,linux,400g
client3,server1,windows,250g
client3,server2,linux,450g
client3,server3,linux,400g
what I need is to update file2 with the missing values from column 1 an 2 only of file1 and adding comma to keep same number of columns
with this example the output should be like this :
client1,server1,windows,250g
client1,server2,linux,450g
client1,server3,linux,400g
client2,server1,windows,250g
client2,server2,linux,450g
client2,server3,linux,400g
client3,server1,windows,250g
client3,server2,linux,450g
client3,server3,linux,400g
client4,server1,,
client5,server2,,
client6,server3,,
client7,server1,,
client7,server2,,
client8,server3,,
client8,server1,,
client8,server2,,
client9,server3,,
I have tried with awk and join but I am not able to get the same result
if creating new file is easier then no issue
thanks for your help
Another awk way
awk -F, -vOFS="," 'NR!=FNR{NF--;NF+=2}!a[$1 FS $2]++' test2 test
or
awk -F, 'NR!=FNR{$0=$1 FS $2",,"}!a[$1 FS $2]++' test2 test
Shortest
awk -F, '{x=$1","$2}NR!=FNR{$0=x",,"}!a[x]++' test2 test
give this line a try:
awk -F, '{k=$1 FS $2}NR==FNR{a[k]++;print;next}!a[k]{print k",,"}' file2 file1
Using the join command. Problem is join can not join on multiple fields, so we need to manipulate the first comma temporarily:
join -t , -o 0,2.2,2.3 -a 1 <(sed 's/,/:/' file1) <(sed 's/,/:/' file2) | sed 's/:/,/'
I tested a line below to compare 1st columns in 2 files and make an union. However the different value with identical 1st column in file2 was eliminated. Below I attached sample files, obtained result, and desired result.
awk -F, 'BEGIN{OFS=","}FNR==NR{a[$1]=$1","$2;next}($1 in a && $2=$2","a[$1])' file2.csv file1.csv >testout.txt
file1
John,red
John,blue
Mike,red
Mike,blue
Carl,red
Carl,blue
file2
John,V1
John,V2
Kent,V1
Kent,V2
Mike,V1
Mike,V2
obtained result
John,red,John,V2
John,blue,John,V2
Mike,red,Mike,V2
Mike,blue,Mike,V2
desired result
John,red,John,V1
John,red,John,V2
John,blue,John,V1
John,blue,John,V2
Mike,red,Kent,V1
Mike,red,Kent,V2
Mike,blue,Kent,V1
Mike,blue,Kent,V2
try this one-liner:
awk -F, -v OFS="," 'NR==FNR{a[$0];next}{for(x in a)if(x~"^"$1FS)print $0,x}' file2 file1
test:
kent$ awk -F, -v OFS="," 'NR==FNR{a[$0];next}{for(x in a)if(x~"^"$1FS)print $0,x}' f2 f1
John,red,John,V1
John,red,John,V2
John,blue,John,V1
John,blue,John,V2
Mike,red,Mike,V1
Mike,red,Mike,V2
Mike,blue,Mike,V1
Mike,blue,Mike,V2
Using join could do that:
join -t, -1 1 -2 1 --nocheck-order -o 1.1 1.2 2.1 2.2 file1 file2
Output:
John,red,John,V1
John,red,John,V2
John,blue,John,V1
John,blue,John,V2
Mike,red,Mike,V1
Mike,red,Mike,V2
Mike,blue,Mike,V1
Mike,blue,Mike,V2
I am trying hard to get the output as I Like.
Current Output:
###Server1###
2
###Server2###
0
###Server3###
5
###Server4###
0
Required Output:
###Server1###
2
###Server3###
5
All I am looking is to grep and ignore any line and the previous line that containts 0 (zero) in any place of the line. I am using bash shell.
This is a possible approach:
$ grep -B 1 "^\s*[1-9]$" file
###Server1###
2
--
###Server3###
5
To get rid of the group separator, we can also do:
$ grep --no-group-separator -B 1 "^\s*[1-9]$" file
###Server1###
2
###Server3###
5
Explanation
Instead of using grep -v to find the inverse, I think it is easier to look for the lines having a single digit value not being 0. This is done with the "^\s*[1-9]$" expression, that allows spaces before the digit.
With -B 1 we make it print also the line before the matched one.
Code for GNU sed:
sed '$!N;/\s*\b0\b\s*/d' file
$ sed '$!N;/\s*\b0\b\s*/d' file
###Server1###
2
###Server3###
5
I have a very huge file in which I need to obtain every nth line and print it into a row.
My data:
1 937 4.320194
2 667 4.913314
3 934 1.783326
4 940 -0.299312
5 939 2.309559
6 936 3.229496
7 611 -1.41808
8 608 -1.154019
9 606 2.159683
10 549 0.767828
I want my data to look like this:
1 937 4.320194
3 934 1.783326
5 939 2.309559
7 611 -1.41808
9 606 2.159683
This is of course an example, I want every 10th line for my huge data file. I tried this so far:
NF == 6 {
if(NR%10) {print;}
}
To print every second line, starting with the first:
awk 'NR%2==1' file.txt
To print every tenth line, starting with the tenth line:
awk 'NR%10==0' file.txt
To use this in a script, add the following to a file called script.awk:
BEGIN {
print "Processing file"
}
NR%10==0
END {
print "Finished processing"
}
Then execute:
awk -f script.awk file.txt
With sed, you can do a lot of variations on this quite easily with the first~step command. For instance:
# Odd lines
sed -n 1~2p file
# Every tenth line (10, 20, 30, ...)
sed -n 10~10p file
# Every tenth line (1, 11, 21, ...)
sed -n 1~10p file
# First plus every tenth (1, 10, 20, 30, ...)
sed -n -e 1p -e 10~10p file
Piece of cake: cat test.txt | awk 'NR % 10 == 1'
It's not (g)awk, but it'll work:
cat myfile | grep ^[[:digit:]]*0[[:blank:]] should do the trick.
Doing it directly in command Prompt (Windows).
Put the gawk.exe file in the folder where the file is and start a command Prompt in the folder, and write
gawk "NR%n==x" oldfile.txt>newfile.txt
n is every n'th line you want to print and x is the starting line.
E.g n=10 and x=1, printing line 1,11,21,31,41......end line from the original file into the new file.
E.g n=20 and x=5, printing line 5,25,45,65......end line from the original file into the new file.