I have to parse the content of multiple files with this content:
style=3D""><a href=3D"https://123456789.com/accounts/confirm_email/19AbCDx=
K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1Mjk=
wODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ" style=3D"color:#3b599
I have to extract the https link, but my grep command can't ignore the new line return, and end with a trunk result:
COMMAND
grep -r -m1 -oh "https://123456789.com/accounts/confirm_email*\s*[^ ]*" /folder/
RESULT
https://123456789.com/accounts/confirm_email/19AbCDx=
DESIDERED RESULT
https://123456789.com/accounts/confirm_email/19AbCDx=K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1MjkwODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ
PS: '=' character is not (always) part of link, but it is the format of the file when break the line.
NB: https://123456789.com/accounts/confirm_email/ is the only constant of the link repeated in all files.
IF I add -z option, -m1 option is ignored and the result is:
https://123456789.com/accounts/confirm_email/19AbCDx=
K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1Mjk=
wODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ"https://123456789.com/accounts/confirm_email/19AbCDx=
K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1Mjk=
wODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ"https://123456789.com/accounts/confirm_email/19AbCDx=
K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1Mjk=
wODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ"
IF I add |head -3 after the command seem to work BUT http is repeated in the last line
COMMAND
grep -r -oh -z "https://123456789.com/accounts/confirm_email*\s*[^ ]*" /folder/ |head-3
https://123456789.com/accounts/confirm_email/19AbCDx=
K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1Mjk=
wODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ"https://123456789.com/accounts/confirm_email/19AbCDx=
How can I exclude it?
man grep:
-z, --null-data
Treat the input as a set of lines, each terminated by a zero
byte (the ASCII NUL character) instead of a newline. - -
So:
$ grep -z -r -m1 -oh "https://123456789.com/accounts/confirm_email*\s*[^ ]*" file
Output:
https://123456789.com/accounts/confirm_email/19AbCDx=
K/bWFyY29A1234529zYW50dWNjaS5ldQ/?app_redirect=3DFalse&ndid=3DHMTU1Mjk=
wODY5OTA1MDk2NTptYXJjb0BtYXJjb3NhbnR1Y2NpLmV1Ojg1OQ"
The newlines will still be there but you could delete them with tr -d \\n
Related
I have a test.txt file with links for example:
google.com?test=
google.com?hello=
and this code
xargs -0 -n1 -a FUZZvul.txt -d '\n' -P 20 -I % curl -ks1L '%/?=DarkLotus' | grep -a 'DarkLotus'
When I type a specific word, such as DarkLotus, in the terminal, it checks the links in the file and it brings me the word which is reflected in the links i provided in the test file
There is no problem here, the problem is that I have many links, and when the result appears in the terminal, I do not know which site reflected the DarkLotus word.
How can i do it?
Try -n option. It shows the line number of file with the matched line.
Best Regards,
Haridas.
I'm not sure what you are up to there, but can you invert it? grep by default prints matching lines. The problem here is you are piping the input from the stdout of the previous commands into grep, and that can lack context at grep. Since you have a file to work with:
$ grep 'DarkLotus' FUZZvul.txt
If your intention is to also follow the link then it might be easier to write a bash script:
#!/bin/bash
for line in `grep 'DarkLotus FUZZvul.txt`
do
link=# extract link from line
echo ${link}
curl -ks1L ${link}
done
Then you could make your script accept user input:
#/bin/bash
word="${0}"
for line in `grep ${word} FUZZvul.txt`
...
and then
$ my_link_getter "DarkLotus"
https://google?somearg=DarkLotus
...
And then you could make the txt file a parameter.
etc.
I was searching for a change that included "foreach" so I used this Mercurial command:
$ hg grep -r "user(mjh) & public() & date(-30)" --diff -i foreach
and it does return the hits where "foreach" was added and removed.
However, I'd like to know the actual commit hashes too. If I add a template:
$ hg grep ... -T '{date|shortdate}\n{node|short}\n{desc|firstline}\n\n'
then I get the commit hash and description as expected, but then I don't see the changed files listed.
Is there a template to capture the output of hg grep? The {files} template lists the files associated with a commit, but that's not the actual grep output. Is there an iterable template keyword available for the grep results?
Please, re-read carefully hg help grep -v (-v is important option), note the following part (new and unexpected for me also)
The following keywords are supported in addition to the common
template
keywords and functions. See also 'hg help templates'.
change String. Character denoting insertion "+" or removal "-".
Available if "--diff" is specified.
lineno Integer. Line number of the match.
path String. Repository-absolute path of the file.
texts List of text chunks.
After it you'll be able to repeat (so-so, because some details will differ slightly) default output of grep in you template
>hg grep --diff -i -r 1166 to_try
>hg grep --diff -i -r 1166 -T "{path}:{rev}:{change}:{texts}\n" to_try
hggit/compat.py:1166:-: for args in parameters_to_try:
hggit/compat.py:1166:+: for (args, kwargs) in parameters_to_try:
and after replacing {rev} by {node|short}
>hg grep --diff -i -r 1166 -T "{path}:{node|short}:{change}:{texts}\n" to_try
hggit/compat.py:f6cef55e6aeb:-: for args in parameters_to_try:
hggit/compat.py:f6cef55e6aeb:+: for (args, kwargs) in parameters_to_try:
Trying to trim the output of a command on terminal. I want to see only strings after blah in a command line output. I tried
<command> | grep -A "blah"
but getting an error output as
grep: illegal option -- A
I am using cut in-conjunction with grep to get strings after a keyword "blah" in this case
echo "random text string blah strings after" | grep -o "blah.*$" | cut -c 5-
grep portion of command extracts whole line after "blah" including "blah" and cut command removes first 4 characters from this string. Only first occurrence of "blah" will be used as delimiter to trim the line.
The command 'grep -c blah *' lists all the files, like below.
% grep -c jill *
file1:1
file2:0
file3:0
file4:0
file5:0
file6:1
%
What I want is:
% grep -c jill * | grep -v ':0'
file1:1
file6:1
%
Instead of piping and grep'ing the output like above, is there a flag to suppress listing files with 0 counts?
SJ
How to grep nonzero counts:
grep -rIcH 'string' . | grep -v ':0$'
-r Recurse subdirectories.
-I Ignore binary files (thanks #tongpu, warlock).
-c Show count of matches. Annoyingly, includes 0-count files.
-H Show file name, even if only one file (thanks #CraigEstey).
'string' your string goes here.
. Start from the current directory.
| grep -v ':0$' Remove 0-count files. (thanks #LaurentiuRoescu)
(I realize the OP was excluding the pipe trick, but this is what works for me.)
Just use awk. e.g. with GNU awk for ENDFILE:
awk '/jill/{c++} ENDFILE{if (c) print FILENAME":"c; c=0}' *
How do I grep the line that followes the # symbol?
I though this should work: grep -A # file
#SRR797059.1 HWIEAS269_0001:5:1:1049:4995 length=38
CGAGCTCCGGCTCGGAGGACCATACTATCGTATGCNGN
+SRR797059.1 HWIEAS269_0001:5:1:1049:4995 length=38
bbbbbbbbbbbbbb^bb]_^aR_]_b_b[_BBBBBBBB
#SRR797059.2 HWIEAS269_0001:5:1:1057:20746 length=38
GGATCTGTAAACATCCTCGACTGGAAGCTTACTATCGT
output
CGAGCTCCGGCTCGGAGGACCATACTATCGTATGCNGN
GGATCTGTAAACATCCTCGACTGGAAGCTTACTATCGT
-A option needs a number after it which would suggest the number of lines you need to print.
From the man page:
> -A num, --after-context=num
> Print num lines of trailing context after each match.
So you should try:
$ grep -A 1 '#' file
#SRR797059.1 HWIEAS269_0001:5:1:1049:4995 length=38
CGAGCTCCGGCTCGGAGGACCATACTATCGTATGCNGN
--
#SRR797059.2 HWIEAS269_0001:5:1:1057:20746 length=38
GGATCTGTAAACATCCTCGACTGGAAGCTTACTATCGT
Answer for updated question:
$ awk 'p;{p=(/#/?1:0)}' file
CGAGCTCCGGCTCGGAGGACCATACTATCGTATGCNGN
GGATCTGTAAACATCCTCGACTGGAAGCTTACTATCGT
Set the flag for the line that contains #. Print the next line and disable it.