I'm trying to grep the following string:
Line must start with a 15 and the rest of the string can have any length of numbers between the pipes. There must be nothing in between the last 2 pipes.
"15|155702|0101|1||"
So far i have:
grep "^15|" $CONCAT_FILE_NAME >> "VAS-"$CONCAT_FILE_NAME
I'm having trouble specifying any length of numbers when using [0-9]
You need to escape the |
grep -E '^15\|([[:digit:]]+\|)+\|$'
Assuming the beginning must start with 15| and there are a total of 5 pipes(|) and nothing between the last two pipes.. And any number of digits between the 2nd 3rd and 4th pipes.
grep "^15\|[0-9]*\|[0-9]*\|[0-9]*\|\|$" $CONCAT_FILE_NAME >> "VAS-"$CONCAT_FILE_NAME
Using awk
cat file
15|155702|0101|1||
15|155702|0101|1|test|
16|155702|0101|1||
awk -F\| '/^15/ && !$(NF-1)' file
15|155702|0101|1||
This prints a line only if it starts with 15 and the second last field, separated by | is blank
So this would be:
VAS-CONCAT_FILE_NAME=$(awk -F\| '/^15/ && !$(NF-1)' <<<"$CONCAT_FILE_NAME")
Another shorter regex that would work
awk '/^15.*\|\|$/' file
This search for all lines starting with 15 and ends with ||
Related
I'm trying to print line containing 2 or 3 numbers along with the rest of the line. I came with the code:
grep -P '[[:digit:]]{2,3}' address
But this even prints the line having 4 digits. I don know why is this happening.
Output:
Neither this code works;
grep -E '[0-9]{2,3}' address
Here is the file containing address text:
12 main st
123 main street
1234 main street
I have already specified to print 2 or 3 values with {2,3} still the filter doesn't work and more than 3 digits line is being printed. Can anyone assist me on this? Thank you so much.
You can use inverted grep (-v) to filter all lines with 4 digits (and above):
grep -vE '[0-9]{4}' address
EDIT:
I noticed you want only 2 or 3 digit along the line, so first command will get you also 1 digit.
Here's the fix, again using same method:
grep -E '[0-9]{2,3}' txt.txt | grep -vE '[0-9]{4}'
I have some files, and I want grep to return the lines, where I have at least one string Position:"Engineer" AND at least one string which does have Position not equal to "Engineer"
So in the below file should return only first line:
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Position:"Engineer" Name:"Eva" Position:"Engineer" Name:"Adam"
I could write something like
grep 'Position:"Engineer"' filename | grep 'Position:"Accountant"'
And this works fine (I get only first line), but the thing is I don't know what are all of the possible values in Position, so the grep needs to be generic something like
grep 'Position:"Engineer"' filename | grep -v 'Position:"Engineer"'
But this doesn't return anything (as both grep contradict each other)
Do you have any idea how this can be done?
This line works :
grep "^Position:\"Engineer\"" filename | grep -v " Position:\"Engineer\""
The first expresion with "$" catch only the Position at the begining of line, the second expression with " " space remove the second "Postion" expression.
You can avoid the pipe and additional subshell by using awk if that is allowed, e.g.
awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Above just checks if the first field contains Engineer and if so checks if field 3 also contains Engineer, and if so skips the record, if not prints it. The second rule, just swaps the order of the tests. The result of the tests is that Engineer can only appear in one of the fields (either first or third, but not both)
Example Use/Output
With your sample input in file, you would have:
$ awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Use negative lookahead to exclude a pattern after match.
grep 'Position:"Engineer"' | grep -P 'Position:"(?!Engineer)'
With two greps in a pipe:
grep -F 'Position:"Engineer"' file | grep -Ev '(Position:"[^"]*").*\1'
or, perhaps more robustly
grep -F 'Position:"Engineer"' file | grep -v 'Position:"Engineer".*Position:"Engineer"'
In general case, if you want to print the lines with unique Position fields,
grep -Ev '(Position:"[^"]*").*\1' file
should do the job, assuming all the lines have the format specified. This will work also when there are more than two Position fields in the line.
I would like to grep for process path which has a variable. Example -
This is one of the proceses running.
/var/www/vhosts/rcsdfg/psd_folr/rcerr-m-deve-udf-172/bin/magt queue:consumers:start customer.import_proditns --single-thread --max-messages=1000
I would like to grep for "psd_folr/rcerr-m-deve-udf-172/bin/magt queue" from the running processes.
The catch is that the number 172 keeps changing, but it will be a 3 digit number only. Please suggest, I tried below but it is not returning any output.
sudo ps axu | grep "psd_folr/rcerr-m-deve-udf-'^[0-9]$'/bin/magt queue"
The most relevant section of your regular expression is -'^[0-9]$'/ which has following problems:
the apostrophes have no syntactical meaning to grep other than read an apostrophe
the caret ^ matches the beginning of a line, but there is no beginning of a line in ps's output at this place
the dollar $ matches the end of a line, but there is no end of a line in ps's output at this place
you want to read 3 digits but [0-9] will only match a single one
Thus, the part of your expression should be modified like this -[0-9]+/ to match any number of digits (+ matches the preceding character any number of times but at least once) or like this -[0-9]{3}/ to match exactly three times ({n} matches the preceding character exactly n times).
If you alter your command, give grep the -E flag so it uses extended regular expressions, otherwise you need to escape the plus or the braces:
sudo ps axu | grep -E "psd_folr/rcerr-m-deve-udf-[0-9]+/bin/magt queue"
Please I have question: I have a file like this
#HWI-ST273:296:C0EFRACXX:2:2101:17125:145325/1
TTAATACACCCAACCAGAAGTTAGCTCCTTCACTTTCAGCTAAATAAAAG
+
8?8A;DDDD;#?++8A?;C;F92+2A#19:1*1?DDDECDE?B4:BDEEI
#BBBB-ST273:296:C0EFRACXX:2:1303:5281:183410/1
TAGCTCCTTCGCTTTCAGCTAAATAAAAGCCCAGTACTTCTTTTTTACCA
+
CCBFFFFFFHHHHJJJJJJJJJIIJJJJJJJJJJJJJJJJJJJIJJJJJI
#HWI-ST273:296:C0EFRACXX:2:1103:16617:140195/1
AAGTTAGCTCCTTCGCTTTCAGCTAAATAAAAGCCCAGTACTTCTTTTTT
+
#C#FF?EDGFDHH#HGHIIGEGIIIIIEDIIGIIIGHHHIIIIIIIIIII
#HWI-ST273:296:C0EFRACXX:2:1207:14316:145263/1
AATACACCCAACCAGAAGTTAGCTCCTTCGCTTTCAGCTAAATAAAAGCC
+
CCCFFFFFHHHHHJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJIJ
I
I'm interested just about the line that starts with '#HWI', but I want to count all the lines that are not starting with '#HWI'. In the example shown, the result will be 1 because there's one line that starts with '#BBB'.
To be more clear: I just want to know know the number of the first line of the patterns (that are 4 line that repeated) that are not '#HWI'; I hope I'm clear enough. Please tell me if you need more clarification
With GNU sed, you can use its extended address to print every fourth line, then use grep to count the ones that don't start with #HWI:
sed -n '1~4p' file.fastq | grep -cv '^#HWI'
Otherwise, you can use e.g. Perl
perl -ne 'print if 1 == $. % 4' -- file.fastq | grep -cv '^#HWI'
$. contains the current line number, % is the modulo operator.
But once we're running Perl, we don't need grep anymore:
perl -lne '++$c if 1 == $. % 4; END { print $c }' -- file.fastq
-l removes newlines from input and adds them to output.
I have below numbers in a file
44700101
44700201
44700301
44700401
44700501
44700601
44700701
44700801
44700901
44701001
want to fetch the above numbers whose 5th and 6th digits are greater than 5 USING GREP WILDCARDS.
something like "grep ....[6-10].. file" should yield below
44700601
44700701
44700801
44700901
44701001
Any help will be appreciated. Thanks
gawk (GNU awk) approach:
awk '{split($0,a,"")}int(a[5]a[6])>5' file
gawk has the ability for FS and for the third argument to split() to be null strings
split($0,a,"") - splits the numeric string into separate numbers (filling array a)
int(a[5]a[6])>5 - print the line if integer representation of the 5th and 6th numbers is greater than 5
grep approach:
grep '^[0-9]\{4\}\([1-9]\|0[6-9]\).*' file
The output (for both approaches):
44700601
44700701
44700801
44700901
44701001
Just use awk:
$ awk 'substr($0,5,2)+0 > 5' file
44700601
44700701
44700801
44700901
44701001