extract a line from a file using csh - grep

I am writing a csh script that will extract a line from a file xyz.
the xyz file contains a no. of lines of code and the line in which I am interested appears after 2-3 lines of the file.
I tried the following code
set product1 = `grep -e '<product_version_info.*/>' xyz`
I want it to be in a way so that as the script find out that line it should save that line in some variable as a string & terminate reading the file immediately ie. it should not read furthermore aftr extracting the line.
Please help !!

grep has an -m or --max-count flag that tells it to stop after a specified number of matches. Hopefully your version of grep supports it.
set product1 = `grep -m 1 -e '<product_version_info.*/>' xyz`
From the man page linked above:
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to
just after the last matching line before exiting, regardless of
the presence of trailing context lines. This enables a calling
process to resume a search. When grep stops after NUM matching
lines, it outputs any trailing context lines. When the -c or
--count option is also used, grep does not output a count
greater than NUM. When the -v or --invert-match option is also
used, grep stops after outputting NUM non-matching lines.
As an alternative, you can always the command below to just check the first few lines (since it always occurs in the first 2-3 lines):
set product1 = `head -3 xyz | grep -e '<product_version_info.*/>'`

I think you're asking to return the first matching line in the file. If so, one solution is to pipe the grep result to head
set product1 = `grep -e '<product_version_info.*/>' xyz | head -1`

Related

Use shell variable in grep lookahead in csh

I am trying to utilize a grep lookahead to get a value at the end of a line for a project I'm working on. The main issue I'm having is that I'm not sure how to use a shell variable in the grep lookahead syntax in cshell
Here's the gist of what I'm trying to do.
There will be a dogfile.txt with several lines listing the names of dogs in the format below
genericDog2033, pomeranian
genericDog2034, greatDane
genericDog2035, Doberman
I wanted a way of retrieving the breed of the dog after the comma on each line so I thought a grep lookahead might be a good way of doing it. The project I'm working on isn't so hard-coded however, so I have no way of knowing what genericDog number I am searching for. There will be a shell variable in a greater while loop which will have access to the dog name.
For example if I set the dogNumber variable to the first dog in the file like so:
set dogNumber = genericDog2033
I then try to access the value of dogNumber in the grep lookahead
set dogBreed = `cat File.txt | grep -oP '(?<=$dogNumber ,)[^ ]*'`
The problem with the line above is that I think grep is looking for the literal string "$dognumber ," in the file which obviously doesn't exist. Is there some sort of wrapper I can put around the shell variable so cshell knows that dogNumber is a variable? I'm also open to other methods of doing this. Any help would be appreciated, this is the literal last line of code I need to finish my project and I'm at my wits end.
Variable expansion only happens inside double quotes ("), and not single quotes ('):
% set var = 'hello'
% echo '$var'
$var
% echo "$var"
hello
Furthermore, you have an error in your regexp:
(?<=$dogNumber ,)[^ ]*
In your data, the space is after the comma, not before.
% set dogNumber = genericDog2033
% set dogBreed = `cat a | grep -oP "(?<=$dogNumber, )[^ ]*"`
% echo $dogBreed
pomeranian
The easiest way to debug this is to not use variables at all in the first place, and simply check if the grep works:
% grep -oP "(?<=genericDog2034 ,)[^ ].*" a
[no output]
Then first make the grep work with static data, add the variable to make that work, and then put it all together by assigning it to a variable.

Why does my grep command output "--" between some lines?

I have a fasta file like the test one here:
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 1:N:0:GCCAAT
CCTAGCACCATGATTTAATGTTTCTTTTGTACGTTCTTTCTTTGGAAACTGCACTTGTTGCAACCTTGCAAGCCATATAAACACATTTCAGATATAAGGCT
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 2:N:0:GCCAAT
AAAACATAAATTTGAGCTTGACAAAAATTAAAAATGAGCCCAGCCTTATATCTGAAATGTGTTTATATGGCTTGCAAGGTTGCAACAAGTGCAGTTTCCAA
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 1:N:0:GCCAAT
ATATTTGAATTATCAGAAATAAACACAAAGAAAACCTAGAACAGATAATTTCTTCCACATTATTGATCAGATACAGATTTCAAGGGTACCGTTGTGAATTG
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 1:N:0:GCCAAT
CTTACTTTGCCTCTCTCAGCCAATGTCTCCTGAGTCTAATTTTTTGGAGGCTAAGCTATGAGCTAATGATGGGTTCCATTTGGGGCCAATGCTTCAGCCTG
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT
When i type in a simple grep command like:
grep -B1 "CTT" test.fasta
I get a really strange output in which "--" is sometimes placed on a newline above the grep hit like so:
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
--
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT
I can't figure out why some fasta entries have this and others don't. I don't get this problem when i remove the -B1. I can remove those lines from my file with a grep -v "--" statement, but I'd really like to understand what's going on here.
You are asking for one line of leading context by using the -B1 option. This means grep will display both the line which matched and the line directly before it. Each match will be separated by -- on a line by itself as shown below:
$ man grep | grep -B1 context
-A num, --after-context=num
Print num lines of trailing context after each match. See also
--
-B num, --before-context=num
Print num lines of leading context before each match. See also
--
-C[num, --context=num]
Print num lines of leading and trailing context surrounding each
--
--context[=num]
Print num lines of leading and trailing context. The default is
The reason you aren't seeing -- between every match is that the context is only displayed above a sequence of consecutive matches. So see the following example:
seq 13 | grep -B1 1
1
--
9
10
11
12
13
The seq command produces all the numbers between 1 and 13. Only the first line and the lines from 10 on contain a 1, so you see the 1 in its own group, then --, then the one line context, then the group of consecutive matching lines.
GREP_COLORS section of the grep manpage says :
Specifies the colors and other attributes used to highlight various > parts of the output. Its value is a colon-separated list
of capabilities that defaults to
ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36 with the rv and
ne boolean capabilities omitted (i.e., false).
and
se=36 SGR substring for separators that are inserted between
selected line fields (:), between context line fields, (-), and
between groups of adjacent lines when nonzero context is
specified (--). The default is a cyan text foreground over the
terminal's default background.
Consider file sample.txt :
$cat sample.txt
ABBB
AAB
AAB
S
S
S
AABB
ABAA
BAA
CCC
$grep -B2 'AAB' sample.txt
ABBB
AAB
AAB
--
S
S
AABB
Here -- is the way of grep to tell you that AAB before -- and S after -- are not adjacent lines in the actual file.

duplicate grep output when comparing two files

I have literally been at this for 5 hours, I have busybox on my device, and I unfortunately do not have -X in grep to make my life easier.
edit;
I have two list both of them have mac addresses, essentially I am just wanting to achieve offline mac address lookup so I don't have to keep looking it up online
list.txt has vendor mac prefix of course this isn't the complete list but just for an example
00:13:46
00:15:E9
00:17:9A
00:19:5B
00:1B:11
00:1C:F0
scan will have list of different mac addresses unknown to which vendor they go to. Which will be full length mac addresses. when ever there is a match I want the line in scan to be output.
Pretty much it does that, but it outputs everything from the scan file, and then it will output matching one at the end, and causing duplicate. I tried sort -u, but it has no effect its as if there is two different output from two different methods, the reason why I say that is because it will instantly output scan file that has everything in it, and couple seconds later it will output the matching one.
From searching I came across this
#!/bin/bash
while read line; do
grep -F 'list' 'scan'
done < list.txt
which displays the duplicate result when/if found, the output is pretty much echoing my scan file then displaying the matched pattern, this creating duplicate
This is frustrating me that I have not found a solution after click on all the links in google up to page 9.
Please someone help me.
I don't know if the Busybox sed supports this out of the box, but it should be easy to do in Awk or Perl instead then.
Create a sed script to print lines from file2 which are covered by a prefix in file1 by transforming each line in file1 into a sed command to print a match for that regular expression:
sed 's%.*%/&/p%' file1 | sed -n -f - file2
The same in Awk:
awk 'NR==FNR { a[++i]="^" $0; next }
{ for (j=1; j<=i; ++j) if ($0 ~ a[j]) print }' file1 file2
Ok guys I did a nested for loop (probably very in efficient) but I got it working printing the matching mac addresses using this
#!/usr/bin/bash
for scanlist in `cat scan | cut -d: -f1,2,3`
do
for listt in `cat list`
do
if [[ $scanlist == $listt ]]; then
grep $scanlist scan
fi
done
done
if anyone can make this more elegant but it works for me for now. I think the problem I had was one list contained just 00:11:22 while my other list contained 00:11:22:33:44:55 that is why I cut it on my scanlist to make same length as my other list. So this only output the matches instead of doing duplicate output.

Recursively grep results and pipe back

I need to find some matching conditions from a file and recursively find the next conditions in previously matched files , i have something like this
input.txt
123
22
33
The files where you need to find above terms in following files, the challenge is if 123 is found in say 10 files , the 22 should be searched in these 10 files only and so on...
Example of files are like f1,f2,f3,f4.....f1200
so it is like i need to grep -w "123" f* | grep -w "123" | .....
its not possible to list them manually so any easier way?
You can solve this using awk script, i ve encountered a similar problem and this will work fine
awk '{ if(!NR){printf("grep -w %d f*|",$1)} else {printf("grep -w %d f*",$1)} }' input.txt | sh
What it Does?
it reads input.txt line by line
until it is at last record , it prints grep -w %d | (note there is a
pipe here)
which is then sent to shell for execution and results are piped back
to back
and when you reach the end the pipe is avoided
Perhaps taking a meta-programming viewpoint would help. Have grep output a series of grep commands. Or write a little PERL program. Maybe Ruby, if the mood suits.
You can use grep -lw to write the list of file names that matched (note that it will stop after finding the first match).
You capture the list of file names and use that for the next iteration in a loop.

Find stored procedures not referenced in source code

I am trying to clean up a legacy database by dropping all procedures that are not used by the application. Using grep, I have been able to determine that a single procedure does not occur in the source code. Is there a way to do this for all of the procedures at once?
UPDATE: While using -E "proc1|proc2" produces an output of all lines in all files which match either pattern, this is not very useful. The legacy database has 2000+ procedures.
I tried to use the -o option thinking that I could use its output as the pattern for an inverse search on the original pattern. However, I found that there is no output when you use the -o option with more than one pattern.
Any other ideas?
UPDATE: After further experimenting, I found that it is the combination of the -i and -o options which are preventing the output. Unfortunately, I need a case insensitive search in this context.
feed the list of stored procedures to egrep separated by "|"
or:
for stored_proc in $stored_procs
do
grep $stored_proc $source_file
done
I've had to do this in the past as well. Don't forget about any procs that may be called from other procs.
If you are using SQL Server you can use this:
SELECT name,
text
FROM sysobjects A
JOIN syscomments B
ON A.id = B.id
WHERE xtype = 'P'
AND text LIKE '%< sproc name >%'
I get output under the circumstances described in your edit:
$ echo "aaaproc1bbb" | grep -Eo 'proc1|proc2'
proc1
$ echo $?
0
$ echo "aaabbb" | grep -Eo 'proc1|proc2'
$ echo $?
1
The exit code shows if there was no match.
You might also find these options to grep useful (-L may be specific to GNU grep):
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines. (-c is specified by POSIX.)
-L, --files-without-match
Suppress normal output; instead print the name of each input
file from which no output would normally have been printed. The
scanning will stop on the first match.
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)
-q, --quiet, --silent
Quiet; do not write anything to standard output. Exit
immediately with zero status if any match is found, even if an
error was detected. Also see the -s or --no-messages option.
(-q is specified by POSIX.)
Sorry for quoting the man page at you, but sometimes it helps to screen things a bit.
Edit:
For a list of filenames that do not contain any of the procedures (case insensitive):
grep -EiL 'proc1|proc2' *
For a list of filenames that contain any of the procedures (case insensitive):
grep -Eil 'proc1|proc2' *
To list the files and show the match (case insensitive):
grep -Eio 'proc1|proc2' *
Start with your list of procedure names. For easy re-use later, sort them and make them lowercase, like so:
tr "[:upper:]" "[:lower:]" < list_of_procedures | sort > sorted_list_o_procs
... now you have a sorted list of the procedure names. Sounds like you're already using gnu grep, so you've got the -o option.
fgrep -o -i -f sorted_list_o_procs source1 source2 ... > list_of_used_procs
Note the use of fgrep: these aren't regexps, really, so why treat them as such. Hopefully you will also find that this magically corrects your output issues ;). Now you have an ugly list of the used procedures. Let's clean them up as we did the orginal list above.
tr "[:upper:]" "[:lower:]" < list_of_used_procs | sort -u > short_list
Now you have a short list of the used procedures. Let's find the ones in the original list that aren't in the short list.
fgrep -v -f short_list sorted_list_o_procs
... and there they are.

Resources