For instance, say I have the list of strings that I want to search for:
alfa bravo charlie delta nebuchadnezzar bartholomew
and in my repo there are files that contain alfa, bravo, charlie and delta, but there are no files that contain nebuchadnezzar and no files that contain bartholomew. Then I want the answer to be:
nebuchadnezzar bartholomew
As you might guess, I'm searching for deprecated things. I ended up using the following Ruby code workaround as I couldn't figure a solution after trying man rg.
%w[alfa bravo charlie delta nebuchadnezzar bartholomew].each do |word|
command = 'rg ' + word
if `#{command}` == '' # execute the command, see if ripgrep found nothing
puts word
end
end
You can use the exit code of rg when no match is found in a simple shell loop construct. From the docs, it seems it returns a code 1 when no match is found for the regex and no errors are seen. Adopting it
for word in alfa bravo charlie delta nebuchadnezzar bartholomew; do
rg "$word" >/dev/null 2>&1
[ "$?" -eq 1 ] && printf '%s\n' "no match for $word"
done
Am given a list if ID which I need to trace back a name in a file
file: ID contains
1
2
3
4
5
6
The ID are contained in a Large 2 GB file called result.txt
ABC=John,dhds,72828,73737,3939,92929
CDE=John,uubad,32424,ajdaio,343533
FG1=Peter,iasisaio,097282,iosoido
WER=Ann,97391279,89719379,7391739
result,**id=1**,iuhdihdio,ihwoihdoih,iuqhwiuh,ABC
result2,**id=2**,9729179,hdqihi,hidqi,82828,CDE
result3,**id=3**,biasi,8u9829,90u209w,jswjso,FG1
So I cat the ID file into a variable
I then use this variable in a loop to grep out the values to link back to the name using grep and cut -d from results.txt and output to a variable
so variable contains ABS CDE FG1
In the same loop I pass the output of the grep to perform another grep on results.txt, to get the name
ie regrets file for ABC CDE FG1
I do get the answer but takes a long time is their a more efficient way?
Thanks
Making some assumptions about your requirement... ID's that are not found in the big file will not be shown in the output; the desired output is in the format shown below.
Here are mock input files - f1 for the id's and f2 for the large file:
[mathguy#localhost test]$ cat f1
1
2
3
4
5
6
[mathguy#localhost test]$ cat f2
ABC=John,dhds,72828,73737,3939,92929
CDE=John,uubad,32424,ajdaio,343533
FG1=Peter,iasisaio,097282,iosoido
WER=Ann,97391279,89719379,7391739
result,**id=1**,iuhdihdio,ihwoihdoih,iuqhwiuh,ABC
result2,**id=2**,9729179,hdqihi,hidqi,82828,CDE
result3,**id=3**,biasi,8u9829,90u209w,jswjso,FG1
Proposed solution and output:
[mathguy#localhost test]$ sed 's/.*/\*\*id=&\*\*/' f1 | grep -Ff - f2 | \
> sed -E 's/^.*\*\*id=([[:digit:]]*)\*\*.*,([^,]*)$/\1 \2/'
1 ABC
2 CDE
3 FG1
The hard work here is done by grep -F which might be just fast enough for your needs. There is some prep work and some clean-up work done by sed, but those are both on small datasets.
First we take the id's from the input file and we output strings in the format **id=<number>**. The output is presented as the fixed-character patterns to grep -F via the option -f (take the patterns from file, in this case from stdin, invoked as -; that is, from the output of sed).
After we find the needed lines from the big file, the final sed just extracts the id and the name from each line.
Note: this assumes that each id is only found once in the big file. (Actually the command will work regardless; but if there are duplicate lines for an id, your business users will have to tell you how to handle. What if you get contradictory names for the same id? Etc.)
I'm trying to reduce a .sm file1 - around 10 GB by filtering it using a fair long set of words (around 180.108 items) listed in a text file file2.
File1 is structured as follows:
word <http://internet.address.com> 1
i.e. one word followed by a blank space, an internet address, and a number.
File2 is a simple .txt file, a list of words, one on each line.
My aim is to create a third file File3 containing only those lines in file1 whose first word matches with the word-list of file2, and disregard the rest.
My attempt is the following:
grep -w -F -f file2.txt file1.sm > file3.sm
I've also attempted something along this line:
gawk 'FNR==NR {a[$1]; next } !($2 in a)' file2.txt file1.sm > file3.sm
but with no success. I understand /^ and \b might play a part here, but I don't know how to fit them in the syntax. I've looked around extensively but no solution seems to fit.
My problem is that here grep reads the entire file1's line, and it can happen that the matching word lies in the webpage address, which I'm not interested in finding out.
sed 's/^/^/' file2.txt | grep -f - file1.sm
join is the best tool for this, not grep/awk:
join -t' ' <(sort file1.sm) <(sort file2.txt) >file3.sm
I am trying to scan a file (test.txt), something like this:
make
bake
baker
makes
take
cook
sbake
for patterns listed in a separate file (ref.txt):
ake
make
bake
look
I have tried looping with grep like so:
while read seq; do grep -c "$seq" test.txt; done > out.txt < ref.txt
However, it doesn't count partial matches only exact matches (or inconsistent in counting partial matches) and I output:
4
1
2
0
instead of
6
2
3
0
Thanks for any help!
See why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some, but not all, of the reasons not to try to do this with a shell loop.
The standard UNIX tool for manipulating text is awk:
$ awk 'NR==FNR{cnt[$0]=0;next} {for (re in cnt) cnt[re]+=gsub(re,"&")} END{for (re in cnt) print re, cnt[re]}' ref.txt test.txt
ake 6
bake 3
look 0
make 2
The above assumes the text in your ref.txt file doesn't contain any regexp metacharacters or if it does then a regexp match is desirable. If it can but you need a string instead of regexp match, you'd need a slightly different solution.
$ while read -r line; do grep -c $line test.txt ; done < ref.txt
6
2
3
0
I just want to take the difference of two files and write them to another without patch tags like + or - or diff tags like > or <. I understand how patches work and how to use the following commands:
diff file1.txt file2.txt | grep ">" > difffile.txt
diff -u file1.txt file2.txt > difffile.patch
patch original.txt < difffile.patch
but when I open my difffile.txt from the first command, I get something like this:
> some line of text
> some other line of text
when what I reallly want is:
some line of text
some other line of text
I thought that maybe indexing the string like
${stringname:2}
would work, but I don't know how to use that with grep or how to index a grep string.
I'm actually parsing html and xml and just want the values differences in some file. I don't know how to do that.
If you just want to remove the first two characters of every line, cut is your friend:
cut -c3- file
Test
$ cat a
hello this is me
and this is you
$ cut -c3- a
llo this is me
d this is you