Append grep match count number on same line - grep

I did this but it appends to a newline instead of the last line
grep -rc "WORD" /home/user/data >> file

Removing trailing newline(\n) character and then re-directing the output to file :
grep -rc "WORD" /home/user/data | tr -d '\n' >> file

Related

Regex for line containing one or more spaces or dashes

I got .txt file with city names, each in separate line. Some of them are few words with one or multiple spaces or words connected with '-'. I need to create bash command which will echo those lines out. Currently I'm using cat piped with grep but I can't get both spaces and dash into one search and I had problems with checking for multiple spaces.
print lines with dash:
cat file.txt | grep ".*-.*"
print lines with spaces:
cat file.txt | grep ".*\s.*"
tho when I try to do:
cat file.txt | grep ".*\s+.*"
I get nothing.
Thanks for help
Something like that should work:
grep -E -- ' |\-' file.txt
Explanation:
-E: to interpret patterns as extended regular expressions
--: to signify the end of command options
' |\-': the line contains either a space or a dash
This does not directly address your question, but is too much to put in a comment.
You don't need the .* in your patterns. .* at the beginning or end of a pattern is useless, because it means "0 or more of any character" and so will always match.
These lines are all identical:
cat file.txt | grep ".*-.*"
cat file.txt | grep "-.*"
cat file.txt | grep "-"
Plus you don't need to cat and pipe:
grep "-" file.txt
When grep pattern matches, the default action is to print the whole line, so .* in all your patterns are redundant, you may delete them. Also, you don't have to use cat file | as you may specify the file to grep directly after pattern, i.e. grep 'pattern' file.txt.
Here are some more details:
grep ".*-.*" = grep -- "-" - returns any lines having a - char (-- singals the end of options, the next thing is the pattern)
grep ".*\s.*" = grep "\s" - matches and returns lines containing a whitespace char (only GNU grep)
grep ".*\s+.*" = grep "\s+" - returns line containing a whitespace followed with a literal + char (since you are using POSIX BRE regex here the unescaped + matches a literal plus symbol).
You want
grep "[[:space:]-]" file.txt
See the online demo:
#!/bin/bash
s='abc - def
ghi
jkl mno'
grep '[[:space:]-]' <<< "$s"
Output:
abc - def
jkl mno
The [[:space:]-] POSIX BRE and ERE (enabled with -E option) compliant pattern matches either any whitespace (with the [:space:] POSIX character class) or a hyphen.
Note that [\s-] won't work since \s inside a bracket expression is not treated as a regex escape sequence but as a mere \ or s.

how do capture(grep/awk/sed) substring from a string the value in shell

New to scripting. I have only one line & one file. How do I capture summerfruit value (ie "mango") & pass it to another variable from the below line.
.. abc.dfe summer.fruit=mango summer.vegetable=potato projects.blah ...
If your grep supports Perl-compatible regular expressions (PCRE):
summerfruit=$(grep -Po 'summer\.fruit=\K[^ ]+' file)
The \K doesn't print the matched summer.fruit= and [^ ]+ matches one or more non-space characters after the =.
without PCRE:
summerfruit=$(grep -o 'summer\.fruit=[^ ]*' file | grep -o '[^=]*$')
With sed:
summerfruit=$(sed 's/.*summer\.fruit=\([^ ]*\).*/\1/' file)
With awk:
summerfruit=$(awk '{
for (i=1;i<=NF;i++)
if ($i ~ /^summer\.fruit=/){ sub(/^[^=]*=/,"",$i); print $i; exit }
}' file)

How to grep a text in a file with new/breaks line

I have to parse the content of multiple files with this content:
style=3D""><a href=3D"https://123456789.com/accounts/confirm_email/19AbCDx=
K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1Mjk=
wODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ" style=3D"color:#3b599
I have to extract the https link, but my grep command can't ignore the new line return, and end with a trunk result:
COMMAND
grep -r -m1 -oh "https://123456789.com/accounts/confirm_email*\s*[^ ]*" /folder/
RESULT
https://123456789.com/accounts/confirm_email/19AbCDx=
DESIDERED RESULT
https://123456789.com/accounts/confirm_email/19AbCDx=K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1MjkwODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ
PS: '=' character is not (always) part of link, but it is the format of the file when break the line.
NB: https://123456789.com/accounts/confirm_email/ is the only constant of the link repeated in all files.
IF I add -z option, -m1 option is ignored and the result is:
https://123456789.com/accounts/confirm_email/19AbCDx=
K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1Mjk=
wODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ"https://123456789.com/accounts/confirm_email/19AbCDx=
K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1Mjk=
wODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ"https://123456789.com/accounts/confirm_email/19AbCDx=
K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1Mjk=
wODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ"
IF I add |head -3 after the command seem to work BUT http is repeated in the last line
COMMAND
grep -r -oh -z "https://123456789.com/accounts/confirm_email*\s*[^ ]*" /folder/ |head-3
https://123456789.com/accounts/confirm_email/19AbCDx=
K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1Mjk=
wODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ"https://123456789.com/accounts/confirm_email/19AbCDx=
How can I exclude it?
man grep:
-z, --null-data
Treat the input as a set of lines, each terminated by a zero
byte (the ASCII NUL character) instead of a newline. - -
So:
$ grep -z -r -m1 -oh "https://123456789.com/accounts/confirm_email*\s*[^ ]*" file
Output:
https://123456789.com/accounts/confirm_email/19AbCDx=
K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1Mjk=
wODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ"
The newlines will still be there but you could delete them with tr -d \\n

How to get lines after match using grep for command line output?

Trying to trim the output of a command on terminal. I want to see only strings after blah in a command line output. I tried
<command> | grep -A "blah"
but getting an error output as
grep: illegal option -- A
I am using cut in-conjunction with grep to get strings after a keyword "blah" in this case
echo "random text string blah strings after" | grep -o "blah.*$" | cut -c 5-
grep portion of command extracts whole line after "blah" including "blah" and cut command removes first 4 characters from this string. Only first occurrence of "blah" will be used as delimiter to trim the line.

How to grep for a 7 digit hexadecimal string and return only that hexadecimal string?

I am trying to extract all the leading 7 digit hexadecimal strings in a file, that contains lines such as:
3fce110:: ..\Utilities\c\misc.c(431): YESFREED (120 bytes) Misc
egrep -o '^[0-9a-f]{7}\b' file.txt
egrep is the same as grep -E; it uses extended regexp.
-o prints only the matching part of each line.
^ anchors the match to the beginning of the line.
[0-9a-f]{7} matches seven hexadecimal characters. If you want to match uppercase letters add A-F here or add the -i flag.
\b checks for a word boundary; it ensures we don't match hex numbers more than 7 digits long.
If all the lines in the file follow the given format then a couple of methods:
$ grep -o '^[^:]*' file
3fce110
$ awk -F: '{print $1}' file
3fce110
$ cut -d: -f1 file
3fce110
$ sed  's/:.*//' file
3fce110

Resources