Here is the situation: I have to find in the output from an hexdump the bytes between a string A and a string B. The structure of the hexdump is something like:
-random bytes
-A + useful bytes + B
-random bytes
-A + useful bytes + B
-random bytes
And now, the questions:
- Is it possible to grep "from A to B"? I haven't seen anything like that in the man page or in the internet. I know i can do it manually, but I need to script it.
- Is it possible to show the hexdump output without the line numbers? It seems very reasonable, but I haven't found the way to do it.
Thanks!
You can use Perl-like lookaround assertions to match everything between A and B, not including A and B:
$ echo 'TEST test A foo bar B test' | grep -oP '(?<=A).*(?=B)'
foo bar
However, taking Michael's answer into account, you'll have to convert the hexdump output to a single string to use grep. You can strip off the 'line numbers' on your way:
hexdump filename | sed -r 's/\S{5,}//g' | tr '\n' ' '
or better
hexdump filename | cut -d ' ' -f 2- | tr '\n' ' '
Now everything is on one line, so grep has to be lazy, not greedy:
$ echo 'TEST test A foo bar B test A bar foo B test' | grep -oP '(?<=A).*?(?=B)'
foo bar
bar foo
But Michael has a point, maybe you should use something more high-level, at least if you need to do it more than once.
P.S. If you are OK with including A and B in the match, just do
$ echo 'TEST test A foo bar B test A bar foo B test' | grep -oP 'A.*?B'
A foo bar B
A bar foo B
grep the program only works on one line at a time; you won't be able to get it to work intelligently on a hex dump.
my suggestion: use the regex functionality in perl or ruby or your favorite scripting language, to grep the raw binary data for the string. This example in ruby:
ARGF.read.force_encoding("BINARY").scan(/STR1(.*?)STR2/);
This will produce an array containing all the binary strings between occurences of STR1 and STR2. From there you could run each one through hexdump(1).
Related
original string :
A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/
Depth of directories will vary, but /trunk part will always remain the same.
And a single character in front of /trunk is the indicator of that line.
desired output :
A /trunk/apple
B /trunk/apple
Z /trunk/orange
Q /trunk/melon/juice/venti/straw
*** edit
I'm sorry I made a mistake by adding a slash at the end of each path in the original string which made the output confusing. Original string didn't have the slash in front of the capital letter, but I'll leave it be.
my attempt :
echo $str1 | sed 's/\(.\/trunk\)/\n\1/g'
I feel like it should work but it doesn't.
With GNU awk for multi-char RS and RT:
$ awk -v RS='([^/]+/){2}[^/\n]+' 'RT{sub("/",OFS,RT); print RT}' file
A trunk/apple
B trunk/apple
Z trunk/orange
I'm setting RS to a regexp describing each string you want to match, i.e. 2 repetitions of non-/s followed by / and then a final string of non-/s (and non-newline for the last string on the input line). RT is automatically set to each of the matching strings, so then I just change the first / to a blank and print the result.
If each path isn't always 3 levels deep but does always start with something/trunk/, e.g.:
$ cat file
A/trunk/apple/banana/B/trunk/apple/Z/trunk/orange
then:
$ awk -v RS='[^/]+/trunk/' 'RT{if (NR>1) print pfx $0; pfx=gensub("/"," ",1,RT)} END{printf "%s%s", pfx, $0}' file
A trunk/apple/banana/
B trunk/apple/
Z trunk/orange
To deal with complex samples input, like where there could be N number of / and values after trunk in a single line please try following.
awk '
{
gsub(/[^/]*\/trunk/,OFS"&")
sub(/^ /,"")
sub(/\//,OFS"&")
gsub(/ +[^/]*\/trunk\/[^[:space:]]+/,"\n&")
sub(/\n/,OFS)
gsub(/\n /,ORS)
gsub(/\/trunk/,OFS"&")
sub(/[[:space:]]+/,OFS)
}
1
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
gsub(/[^/]*\/trunk/,OFS"&") ##Globally substituting everything from / to till next / followed by trunk/ with space and matched value.
sub(/^ /,"") ##Substituting starting space with NULL here.
sub(/\//,OFS"&") ##Substituting first / with space / here.
gsub(/ +[^/]*\/trunk\/[^[:space:]]+/,"\n&") ##Globally substituting spaces followed by everything till / trunk till space comes with new line and matched values.
sub(/\n/,OFS) ##Substituting new line with space.
gsub(/\n /,ORS) ##Globally substituting new line space with ORS.
gsub(/\/trunk/,OFS"&") ##Globally substituting /trunk with OFS and matched value.
sub(/[[:space:]]+/,OFS) ##Substituting spaces with OFS here.
}
1 ##Printing edited/non-edited line here.
' Input_file ##Mentioning Input_file name here.
With your shown samples, please try following awk code.
awk '{gsub(/\/trunk/,OFS "&");gsub(/trunk\/[^/]*\//,"&\n")} 1' Input_file
In awk you can try this solution. It deals with the special requirement of removing forward slashes when the next character is upper case. Will not win a design award but works.
$ echo "A/trunk/apple/B/trunk/apple/Z/trunk/orange" |
awk -F '' '{ x=""; for(i=1;i<=NF;i++){
if($(i+1)~/[A-Z]/&&$i=="/"){$i=""};
if($i~/[A-Z]/){ printf x""$i" "}
else{ x="\n"; printf $i } }; print "" }'
A /trunk/apple
B /trunk/apple
Z /trunk/orange
Also works for n words. Actually works with anything that follows the given pattern.
$ echo "A/fruits/apple/mango/B/anything/apple/pear/banana/Z/ball/orange/anything" |
awk -F '' '{ x=""; for(i=1;i<=NF;i++){
if($(i+1)~/[A-Z]/&&$i=="/"){$i=""};
if($i~/[A-Z]/){ printf x""$i" "}
else{ x="\n"; printf $i } }; print "" }'
A /fruits/apple/mango
B /anything/apple/pear/banana
Z /ball/orange/anything
This might work for you (GNU sed):
sed 's/[^/]*/& /;s/\//\n/3;P;D' file
Separate the first word from the first / by a space.
Replace the third / by a newline.
Print/delete the first line and repeat.
If the first word has the property that it is only one character long:
sed 's/./& /;s#/\(./\)#\n\1#;P;D' file
Or if the first word has the property that it begins with an upper case character:
sed 's/[[:upper:]][^/]*/& /;s#/\([[:upper:][^/]*/\)#\n\1#;P;D' file
Or if the first word has the property that it is followed by /trunk/:
sed -E 's#([^/]*)(/trunk/)#\n\1 \2#g;s/.//' file
With GNU sed:
$ str="A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/"
$ sed -E 's|/?(.)(/trunk/)|\n\1 \2|g;s|/$||' <<< "$str"
A /trunk/apple
B /trunk/apple
Z /trunk/orange/citrus
Q /trunk/melon/juice/venti/straw
Note the first empty output line. If it is undesirable we can separate the processing of the first output line:
$ sed -E 's|(.)|\1 |;s|/(.)(/trunk/)|\n\1 \2|g;s|/$||' <<< "$str"
A /trunk/apple
B /trunk/apple
Z /trunk/orange/citrus
Q /trunk/melon/juice/venti/straw
Using gnu awk you could use FPAT to set contents of each field using a pattern.
When looping the fields, replace the first / with /
str1="A/trunk/apple/B/trunk/apple/Z/trunk/orange"
echo $str1 | awk -v FPAT='[^/]+/trunk/[^/]+' '{
for(i=1;i<=NF;i++) {
sub("/", " /", $i)
print $i
}
}'
The pattern matches
[^/]+ Match any char except /
/trunk/[^/]+ Match /trunk/ and any char except /
Output
A /trunk/apple
B /trunk/apple
Z /trunk/orange
Other patterns that can be used by FPAT after the updated question:
Matching a word boundary \\< and an uppercase char A-Z and after /trunk repeat / and lowercase chars
FPAT='\\<[A-Z]/trunk(/[a-z]+)*'
If the length of the strings for the directories after /trunk are at least 2 characters:
FPAT='\\<[A-Z]/trunk(/[^/]{2,})*'
If there can be no separate folders that consist of a single uppercase char A-Z
FPAT='\\<[A-Z]/trunk(/([^/A-Z][^/]*|[^/]{2,}))*'
Output
A /trunk/apple
B /trunk/apple
Z /trunk/orange/citrus
Q /trunk/melon/juice/venti/straw
Assuming your data will always be in the format provided as a single string, you can try this sed.
$ sed 's/$/\//;s|\([A-Z]\)\([a-z/]*\)/\([a-z]*\?\)|\1 \2\3\n|g' input_file
$ echo "A/trunk/apple/pine/skunk/B/trunk/runk/bunk/apple/Z/trunk/orange/T/fruits/apple/mango/P/anything/apple/pear/banana/L/ball/orange/anything/S/fruits/apple/mango/B/rupert/cream/travel/scout/H/tall/mountains/pottery/barnes" | sed 's/$/\//;s|\([A-Z]\)\([a-z/]*\)/\([a-z]*\?\)|\1 \2\3\n|g'
A /trunk/apple/pine/skunk
B /trunk/runk/bunk/apple
Z /trunk/orange
T /fruits/apple/mango
P /anything/apple/pear/banana
L /ball/orange/anything
S /fruits/apple/mango
B /rupert/cream/travel/scout
H /tall/mountains/pottery/barnes
Some fun with perl, where you can using nonconsuming regex to autosplit into the #F array, then just print however you want.
perl -lanF'/(?=.{1,2}trunk)/' -e 'print "$F[2*$_] $F[2*$_+1]" for 0..$#F/2'
Step #1: Split
perl -lanF/(?=.{1,2}trunk)/'
This will take the input stream, and split each line whenever the pattern .{1,2}trunk is encountered
Because we want to retain trunk and the preceeding 1 or 2 chars, we wrap the split pattern in the (?=) for a non-consuming forward lookahead
This splits things up this way:
$ echo A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/ | perl -lanF'/(?=.{1,2}trunk)/' -e 'print join " ", #F'
A /trunk/apple/ B /trunk/apple/ Z /trunk/orange/citrus/ Q /trunk/melon/juice/venti/straw/
Step 2: Format output:
The #F array contains pairs that we want to print in order, so we'll iterate half of the array indices, and print 2 at a time:
print "$F[2*$_] $F[2*$_+1]" for 0..$#F/2 --> Double the iterator, and print pairs
using perl -l means each print has an implicit \n at the end
The results:
$ echo A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/ | perl -lanF'/(?=.{1,2}trunk)/' -e 'print "$F[2*$_] $F[2*$_+1]" for 0..$#F/2'
A /trunk/apple/
B /trunk/apple/
Z /trunk/orange/citrus/
Q /trunk/melon/juice/venti/straw/
Endnote: Perl obfuscation that didn't work.
Any array in perl can be cast as a hash, of the format (key,val,key,val....)
So %F=#F; print "$_ $F{$_}" for keys %F seems like it would be really slick
But you lose order:
$ echo A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/ | perl -lanF'/(?=.{1,2}trunk)/' -e '%F=#F; print "$_ $F{$_}" for keys %F'
Z /trunk/orange/citrus/
A /trunk/apple/
Q /trunk/melon/juice/venti/straw/
B /trunk/apple/
Update
With your new data file:
$ cat file
A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/
This GNU awk solution:
awk '
{
sub(/[/]$/,"")
gsub(/[[:upper:]]{1}/,"& ")
print gensub(/([/])([[:upper:]])/,"\n\\2","g")
}' file
A /trunk/apple
B /trunk/apple
Z /trunk/orange/citrus
Q /trunk/melon/juice/venti/straw
I have this text file:
# cat letter.txt
this
is
just
a
test
to
check
if
grep
works
The letter "e" appear in 3 words.
# grep e letter.txt
test
check
grep
Is there any way to return the letter printed on left of the selected character?
expected.txt
t
h
r
With shown samples in awk, could you please try following.
awk '/e/{print substr($0,index($0,"e")-1,1)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/e/{ ##Looking if current line has e in it then do following.
print substr($0,index($0,"e")-1,1)
##Printing sub string from starting value of index e-1 and print 1 character from there.
}
' Input_file ##Mentioning Input_file name here.
You can use positive lookahead to match a character that is followed by an e, without making the e part of the match.
cat letter.txt | grep -oP '.(?=e)'
With sed:
sed -nE 's/.*(.)e.*/\1/p' letter.txt
Assuming you have this input file:
cat file
this
is
just
a
test
to
check
if
grep
works
egg
element
You may use this grep + sed solution to find letter or empty string before e:
grep -oE '(^|.)e' file | sed 's/.$//'
t
h
r
l
m
Or alternatively this single awk command should also work:
awk -F 'e' 'NF > 1 {
for (i=1; i<NF; i++) print substr($i, length($i), 1)
}' file
This might work for you (GNU sed):
sed -nE '/(.)e/{s//\n\1\n/;s/^[^\n]*\n//;P;D}' file
Turn off implicit printing and enable extended regexp -nE.
Focus only on lines that meet the requirements i.e. contain a character before e.
Surround the required character by newlines.
Remove any characters before and including the first newline.
Print the first line (up to the second newline).
Delete the first line (including the newline).
Repeat.
N.B. The solution will print each such character on a separate line.
To print all such characters on their own line, use:
sed -nE '/(.e)/{s//\n\1/g;s/^/e/;s/e[^\n]*\n?//g;s/\B/ /g;p}' file
N.B. Remove the s/\B /g if space separation is not needed.
With GNU awk you can use empty string as FS to split the input as individual characters:
awk -v FS= '/[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file
t
h
r
Excluding "e" at the beginning in the for loop.
edited
empty string if e is the first character in the word.
For example, this input:
cat file2
grep
erroneously
egg
Wednesday
effectively
awk -v FS= '/^[e]/ {print ""} /[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file2
r
n
W
n
f
v
I have some files, and I want grep to return the lines, where I have at least one string Position:"Engineer" AND at least one string which does have Position not equal to "Engineer"
So in the below file should return only first line:
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Position:"Engineer" Name:"Eva" Position:"Engineer" Name:"Adam"
I could write something like
grep 'Position:"Engineer"' filename | grep 'Position:"Accountant"'
And this works fine (I get only first line), but the thing is I don't know what are all of the possible values in Position, so the grep needs to be generic something like
grep 'Position:"Engineer"' filename | grep -v 'Position:"Engineer"'
But this doesn't return anything (as both grep contradict each other)
Do you have any idea how this can be done?
This line works :
grep "^Position:\"Engineer\"" filename | grep -v " Position:\"Engineer\""
The first expresion with "$" catch only the Position at the begining of line, the second expression with " " space remove the second "Postion" expression.
You can avoid the pipe and additional subshell by using awk if that is allowed, e.g.
awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Above just checks if the first field contains Engineer and if so checks if field 3 also contains Engineer, and if so skips the record, if not prints it. The second rule, just swaps the order of the tests. The result of the tests is that Engineer can only appear in one of the fields (either first or third, but not both)
Example Use/Output
With your sample input in file, you would have:
$ awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Use negative lookahead to exclude a pattern after match.
grep 'Position:"Engineer"' | grep -P 'Position:"(?!Engineer)'
With two greps in a pipe:
grep -F 'Position:"Engineer"' file | grep -Ev '(Position:"[^"]*").*\1'
or, perhaps more robustly
grep -F 'Position:"Engineer"' file | grep -v 'Position:"Engineer".*Position:"Engineer"'
In general case, if you want to print the lines with unique Position fields,
grep -Ev '(Position:"[^"]*").*\1' file
should do the job, assuming all the lines have the format specified. This will work also when there are more than two Position fields in the line.
I have the following issue, I need to retrieve all words that contains exactly 2 vowels (in any order) from a file. The file only contains one word per line.
My current workaround is:
Grep1: Retrieve words such as earth, over, under, one...
grep -i "^[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words > A.txt
and
Grep2: Retrieve words such as formless, deep, said...
grep -i "^[^aeiou][^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words > B.txt
the above solution works but when I concatenate both regexs into a single regex then return nothing!
Mother of Grep1 & Grep2: should retrieve everything!
grep -i "^[aeiou][^aeiou]*[aeiou][^aeiou]*$|^[^aeiou][^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words
I think issue is around my implementation of ^$ in expression but have tried diff versions with no sucess!
Any help will be highly appreciated!
OS is AIX 6100-09-04-1441
You were close. This should work:
grep -i "^[^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words > A.txt
So it should find all eight possibilities (two vowels identify three nonvowel sequence, each possibly empty; 2^3 is 8):
[ ]I[ ]o[ ]
[ ]e[ ]a[r]
[ ]e[r]a[ ]
[ ]e[l]a[n]
[T]e[ ]a[ ]
[D]e[ ]a[r]
[D]e[w]a[r]
[D]a[w]a[ ]
[H]a[w]a[y]
As for concatenation, | needs escaping. You can use a single anchoring:
^(regexp1\|regexp2)$
Since the * can match 0 times or more you should be able to start the string with [^aeiou]*: try
"^[^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$"
As for fixing your regex, I think you need to escape the bar as \|, so
grep -i "^[aeiou][^aeiou]*[aeiou][^aeiou]*$\|^[^aeiou][^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words
If you don't mind Perl, you could use this:
perl -lne '$m=$_; tr/[aeiou]//cd; print $m if length()==2;' /usr/share/dict/words
That says... "save the current line (word) in $m. Delete everything that is not a vowel. Print the original word if there are two things (i.e vowels) left."
Note that I am using the system dictionary as input for my tests.
You could do pretty much the same thing in awk.
If you're able to use an alternative to grep tr with wc works well:
words=/path/to/words.txt
while read -e word ; do
v=$(echo $word | tr -cd 'aeiou' | wc -c)
[[ ! $v -eq "2" ]] || echo $word >> output.txt
done < $words
This reads the original file line by line, counts the vowels & returns results with only 2 to output.txt.
I need to find some matching conditions from a file and recursively find the next conditions in previously matched files , i have something like this
input.txt
123
22
33
The files where you need to find above terms in following files, the challenge is if 123 is found in say 10 files , the 22 should be searched in these 10 files only and so on...
Example of files are like f1,f2,f3,f4.....f1200
so it is like i need to grep -w "123" f* | grep -w "123" | .....
its not possible to list them manually so any easier way?
You can solve this using awk script, i ve encountered a similar problem and this will work fine
awk '{ if(!NR){printf("grep -w %d f*|",$1)} else {printf("grep -w %d f*",$1)} }' input.txt | sh
What it Does?
it reads input.txt line by line
until it is at last record , it prints grep -w %d | (note there is a
pipe here)
which is then sent to shell for execution and results are piped back
to back
and when you reach the end the pipe is avoided
Perhaps taking a meta-programming viewpoint would help. Have grep output a series of grep commands. Or write a little PERL program. Maybe Ruby, if the mood suits.
You can use grep -lw to write the list of file names that matched (note that it will stop after finding the first match).
You capture the list of file names and use that for the next iteration in a loop.