I'm trying to run this command to do some cleanups.
egrep -v -f ref_file.css my_file.css
However, it is giving me an error.
egrep: Unmatched ( or \ (
How can I go around that? I'm on mac terminal.
Thanks,
Tee
I know I'm a bit late, but maybe this can help other users looking for the same...
If you just want to check the differences between two files, then you can use diff, as mentioned in the comments. However, if the files are somewhat similar, and you are looking for a way to check the differences visually, then you can use sdiff file1 file2 to get a diff-ish output but showing the files side by side.
In the output, | between the two files means that the line is somewhat shared between the two files, but with differences in the text or format.
< and > means that the line to the left or to the right exists in the left or the right file, and not in the other, as you may have expected.
You may then grep the output from sdiff for ' | ', ' < ', and ' > ', and analize that in case you need to do further processing on the files...
Related
I'm trying to reduce a .sm file1 - around 10 GB by filtering it using a fair long set of words (around 180.108 items) listed in a text file file2.
File1 is structured as follows:
word <http://internet.address.com> 1
i.e. one word followed by a blank space, an internet address, and a number.
File2 is a simple .txt file, a list of words, one on each line.
My aim is to create a third file File3 containing only those lines in file1 whose first word matches with the word-list of file2, and disregard the rest.
My attempt is the following:
grep -w -F -f file2.txt file1.sm > file3.sm
I've also attempted something along this line:
gawk 'FNR==NR {a[$1]; next } !($2 in a)' file2.txt file1.sm > file3.sm
but with no success. I understand /^ and \b might play a part here, but I don't know how to fit them in the syntax. I've looked around extensively but no solution seems to fit.
My problem is that here grep reads the entire file1's line, and it can happen that the matching word lies in the webpage address, which I'm not interested in finding out.
sed 's/^/^/' file2.txt | grep -f - file1.sm
join is the best tool for this, not grep/awk:
join -t' ' <(sort file1.sm) <(sort file2.txt) >file3.sm
I created a test file with the following:
<cert>
</cert>
I'm now trying to find this with grep and the following command, but it take forever to run.
How can I search quickly for files that contain adjacent lines like these?
tr -d '\n' | grep '<cert></cert>' test.test
So, from the comments, you're trying to get the filenames that contain an empty <cert>..</cert> element. You're using several tools wrong. As #iiSeymour pointed out, tr only reads from standard input-- so if you want to use it to select from lots of filenames, you'll need to use a loop. grep prints out matching lines, not filenames; though you could use grep -l to see the filenames instead.
But you're only joining lines because grep works one line at a time; so let's use a better tool. Here's how to search with awk:
awk '/<cert>/ { started=1; }
/<\/cert>/ { if (started) { print FILENAME; nextfile;} }
!/<cert>/ { started = 0; }' file1 file2 *.txt
It checks each line and keeps track of whether the previous line matched <cert>. (!/pattern/ sets the flag back to zero on lines not matching /pattern/.) Call it with all your files (or with a wildcard like *.txt).
And a friendly suggestion: Next time, try each command separately (you've been stuck on this for hours and you still don't know what grep does?). And have a quick look at the manual for the tools you want to use. Unix tools are usually too complex for simple trial and error.
I have literally been at this for 5 hours, I have busybox on my device, and I unfortunately do not have -X in grep to make my life easier.
edit;
I have two list both of them have mac addresses, essentially I am just wanting to achieve offline mac address lookup so I don't have to keep looking it up online
list.txt has vendor mac prefix of course this isn't the complete list but just for an example
00:13:46
00:15:E9
00:17:9A
00:19:5B
00:1B:11
00:1C:F0
scan will have list of different mac addresses unknown to which vendor they go to. Which will be full length mac addresses. when ever there is a match I want the line in scan to be output.
Pretty much it does that, but it outputs everything from the scan file, and then it will output matching one at the end, and causing duplicate. I tried sort -u, but it has no effect its as if there is two different output from two different methods, the reason why I say that is because it will instantly output scan file that has everything in it, and couple seconds later it will output the matching one.
From searching I came across this
#!/bin/bash
while read line; do
grep -F 'list' 'scan'
done < list.txt
which displays the duplicate result when/if found, the output is pretty much echoing my scan file then displaying the matched pattern, this creating duplicate
This is frustrating me that I have not found a solution after click on all the links in google up to page 9.
Please someone help me.
I don't know if the Busybox sed supports this out of the box, but it should be easy to do in Awk or Perl instead then.
Create a sed script to print lines from file2 which are covered by a prefix in file1 by transforming each line in file1 into a sed command to print a match for that regular expression:
sed 's%.*%/&/p%' file1 | sed -n -f - file2
The same in Awk:
awk 'NR==FNR { a[++i]="^" $0; next }
{ for (j=1; j<=i; ++j) if ($0 ~ a[j]) print }' file1 file2
Ok guys I did a nested for loop (probably very in efficient) but I got it working printing the matching mac addresses using this
#!/usr/bin/bash
for scanlist in `cat scan | cut -d: -f1,2,3`
do
for listt in `cat list`
do
if [[ $scanlist == $listt ]]; then
grep $scanlist scan
fi
done
done
if anyone can make this more elegant but it works for me for now. I think the problem I had was one list contained just 00:11:22 while my other list contained 00:11:22:33:44:55 that is why I cut it on my scanlist to make same length as my other list. So this only output the matches instead of doing duplicate output.
I need to find some matching conditions from a file and recursively find the next conditions in previously matched files , i have something like this
input.txt
123
22
33
The files where you need to find above terms in following files, the challenge is if 123 is found in say 10 files , the 22 should be searched in these 10 files only and so on...
Example of files are like f1,f2,f3,f4.....f1200
so it is like i need to grep -w "123" f* | grep -w "123" | .....
its not possible to list them manually so any easier way?
You can solve this using awk script, i ve encountered a similar problem and this will work fine
awk '{ if(!NR){printf("grep -w %d f*|",$1)} else {printf("grep -w %d f*",$1)} }' input.txt | sh
What it Does?
it reads input.txt line by line
until it is at last record , it prints grep -w %d | (note there is a
pipe here)
which is then sent to shell for execution and results are piped back
to back
and when you reach the end the pipe is avoided
Perhaps taking a meta-programming viewpoint would help. Have grep output a series of grep commands. Or write a little PERL program. Maybe Ruby, if the mood suits.
You can use grep -lw to write the list of file names that matched (note that it will stop after finding the first match).
You capture the list of file names and use that for the next iteration in a loop.
I am trying to clean up a legacy database by dropping all procedures that are not used by the application. Using grep, I have been able to determine that a single procedure does not occur in the source code. Is there a way to do this for all of the procedures at once?
UPDATE: While using -E "proc1|proc2" produces an output of all lines in all files which match either pattern, this is not very useful. The legacy database has 2000+ procedures.
I tried to use the -o option thinking that I could use its output as the pattern for an inverse search on the original pattern. However, I found that there is no output when you use the -o option with more than one pattern.
Any other ideas?
UPDATE: After further experimenting, I found that it is the combination of the -i and -o options which are preventing the output. Unfortunately, I need a case insensitive search in this context.
feed the list of stored procedures to egrep separated by "|"
or:
for stored_proc in $stored_procs
do
grep $stored_proc $source_file
done
I've had to do this in the past as well. Don't forget about any procs that may be called from other procs.
If you are using SQL Server you can use this:
SELECT name,
text
FROM sysobjects A
JOIN syscomments B
ON A.id = B.id
WHERE xtype = 'P'
AND text LIKE '%< sproc name >%'
I get output under the circumstances described in your edit:
$ echo "aaaproc1bbb" | grep -Eo 'proc1|proc2'
proc1
$ echo $?
0
$ echo "aaabbb" | grep -Eo 'proc1|proc2'
$ echo $?
1
The exit code shows if there was no match.
You might also find these options to grep useful (-L may be specific to GNU grep):
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines. (-c is specified by POSIX.)
-L, --files-without-match
Suppress normal output; instead print the name of each input
file from which no output would normally have been printed. The
scanning will stop on the first match.
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)
-q, --quiet, --silent
Quiet; do not write anything to standard output. Exit
immediately with zero status if any match is found, even if an
error was detected. Also see the -s or --no-messages option.
(-q is specified by POSIX.)
Sorry for quoting the man page at you, but sometimes it helps to screen things a bit.
Edit:
For a list of filenames that do not contain any of the procedures (case insensitive):
grep -EiL 'proc1|proc2' *
For a list of filenames that contain any of the procedures (case insensitive):
grep -Eil 'proc1|proc2' *
To list the files and show the match (case insensitive):
grep -Eio 'proc1|proc2' *
Start with your list of procedure names. For easy re-use later, sort them and make them lowercase, like so:
tr "[:upper:]" "[:lower:]" < list_of_procedures | sort > sorted_list_o_procs
... now you have a sorted list of the procedure names. Sounds like you're already using gnu grep, so you've got the -o option.
fgrep -o -i -f sorted_list_o_procs source1 source2 ... > list_of_used_procs
Note the use of fgrep: these aren't regexps, really, so why treat them as such. Hopefully you will also find that this magically corrects your output issues ;). Now you have an ugly list of the used procedures. Let's clean them up as we did the orginal list above.
tr "[:upper:]" "[:lower:]" < list_of_used_procs | sort -u > short_list
Now you have a short list of the used procedures. Let's find the ones in the original list that aren't in the short list.
fgrep -v -f short_list sorted_list_o_procs
... and there they are.