Explanation needed for truncate_stream example in jq manual - stream

I study jq manual and blocked in truncate_stream examples as follows:
$ echo '1' | jq -c '[ 1 |truncate_stream([[0],1],[[1,0],2],[[1,0]],[[1]])]'
[[[0],2],[[0]]]
Can someone explain the example in detail?
Thanks for your interest on basic question.
Cheers.

First, the manual is slightly misleading in that the input value shown ("Input 1")
is irrelevant. This can be seen e.g. from the fact that the following invocation produces the same array:
$ jq -n -c '[ 1 |truncate_stream([[0],1],[[1,0],2],[[1,0]],[[1]])]'
[[[0],2],[[0]]]
Now, to understand how we get from what I'll call the input stream:
[[0],1], [[1,0],2], [[1,0]], [[1]]
to the output stream:
[[0],2], [[0]]]
it is helpful to remember that each array in the input stream either has the form
[path, value]
or else the form
[path]
The effect of N | truncate_stream(STREAM)
where N is a non-negative integer is to remove the first N elements of each path
with the understanding that any item in which path == [] is to be removed.
Thus, removing the first item from each path yields:
[[],1], [[0],2], [[0]], [[]]
and this then becomes:
[[0],2], [[0]]
Q.E.D.

Related

How to get ripgrep to tell me which expressions from a list have no matches on the filesystem

For instance, say I have the list of strings that I want to search for:
alfa bravo charlie delta nebuchadnezzar bartholomew
and in my repo there are files that contain alfa, bravo, charlie and delta, but there are no files that contain nebuchadnezzar and no files that contain bartholomew. Then I want the answer to be:
nebuchadnezzar bartholomew
As you might guess, I'm searching for deprecated things. I ended up using the following Ruby code workaround as I couldn't figure a solution after trying man rg.
%w[alfa bravo charlie delta nebuchadnezzar bartholomew].each do |word|
command = 'rg ' + word
if `#{command}` == '' # execute the command, see if ripgrep found nothing
puts word
end
end
You can use the exit code of rg when no match is found in a simple shell loop construct. From the docs, it seems it returns a code 1 when no match is found for the regex and no errors are seen. Adopting it
for word in alfa bravo charlie delta nebuchadnezzar bartholomew; do
rg "$word" >/dev/null 2>&1
[ "$?" -eq 1 ] && printf '%s\n' "no match for $word"
done

Grepping twice using result of first Grep in Large file

Am given a list if ID which I need to trace back a name in a file
file: ID contains
1
2
3
4
5
6
The ID are contained in a Large 2 GB file called result.txt
ABC=John,dhds,72828,73737,3939,92929
CDE=John,uubad,32424,ajdaio,343533
FG1=Peter,iasisaio,097282,iosoido
WER=Ann,97391279,89719379,7391739
result,**id=1**,iuhdihdio,ihwoihdoih,iuqhwiuh,ABC
result2,**id=2**,9729179,hdqihi,hidqi,82828,CDE
result3,**id=3**,biasi,8u9829,90u209w,jswjso,FG1
So I cat the ID file into a variable
I then use this variable in a loop to grep out the values to link back to the name using grep and cut -d from results.txt and output to a variable
so variable contains ABS CDE FG1
In the same loop I pass the output of the grep to perform another grep on results.txt, to get the name
ie regrets file for ABC CDE FG1
I do get the answer but takes a long time is their a more efficient way?
Thanks
Making some assumptions about your requirement... ID's that are not found in the big file will not be shown in the output; the desired output is in the format shown below.
Here are mock input files - f1 for the id's and f2 for the large file:
[mathguy#localhost test]$ cat f1
1
2
3
4
5
6
[mathguy#localhost test]$ cat f2
ABC=John,dhds,72828,73737,3939,92929
CDE=John,uubad,32424,ajdaio,343533
FG1=Peter,iasisaio,097282,iosoido
WER=Ann,97391279,89719379,7391739
result,**id=1**,iuhdihdio,ihwoihdoih,iuqhwiuh,ABC
result2,**id=2**,9729179,hdqihi,hidqi,82828,CDE
result3,**id=3**,biasi,8u9829,90u209w,jswjso,FG1
Proposed solution and output:
[mathguy#localhost test]$ sed 's/.*/\*\*id=&\*\*/' f1 | grep -Ff - f2 | \
> sed -E 's/^.*\*\*id=([[:digit:]]*)\*\*.*,([^,]*)$/\1 \2/'
1 ABC
2 CDE
3 FG1
The hard work here is done by grep -F which might be just fast enough for your needs. There is some prep work and some clean-up work done by sed, but those are both on small datasets.
First we take the id's from the input file and we output strings in the format **id=<number>**. The output is presented as the fixed-character patterns to grep -F via the option -f (take the patterns from file, in this case from stdin, invoked as -; that is, from the output of sed).
After we find the needed lines from the big file, the final sed just extracts the id and the name from each line.
Note: this assumes that each id is only found once in the big file. (Actually the command will work regardless; but if there are duplicate lines for an id, your business users will have to tell you how to handle. What if you get contradictory names for the same id? Etc.)

Grep words with exact two vowels

I have the following issue, I need to retrieve all words that contains exactly 2 vowels (in any order) from a file. The file only contains one word per line.
My current workaround is:
Grep1: Retrieve words such as earth, over, under, one...
grep -i "^[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words > A.txt
and
Grep2: Retrieve words such as formless, deep, said...
grep -i "^[^aeiou][^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words > B.txt
the above solution works but when I concatenate both regexs into a single regex then return nothing!
Mother of Grep1 & Grep2: should retrieve everything!
grep -i "^[aeiou][^aeiou]*[aeiou][^aeiou]*$|^[^aeiou][^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words
I think issue is around my implementation of ^$ in expression but have tried diff versions with no sucess!
Any help will be highly appreciated!
OS is AIX 6100-09-04-1441
You were close. This should work:
grep -i "^[^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words > A.txt
So it should find all eight possibilities (two vowels identify three nonvowel sequence, each possibly empty; 2^3 is 8):
[ ]I[ ]o[ ]
[ ]e[ ]a[r]
[ ]e[r]a[ ]
[ ]e[l]a[n]
[T]e[ ]a[ ]
[D]e[ ]a[r]
[D]e[w]a[r]
[D]a[w]a[ ]
[H]a[w]a[y]
As for concatenation, | needs escaping. You can use a single anchoring:
^(regexp1\|regexp2)$
Since the * can match 0 times or more you should be able to start the string with [^aeiou]*: try
"^[^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$"
As for fixing your regex, I think you need to escape the bar as \|, so
grep -i "^[aeiou][^aeiou]*[aeiou][^aeiou]*$\|^[^aeiou][^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words
If you don't mind Perl, you could use this:
perl -lne '$m=$_; tr/[aeiou]//cd; print $m if length()==2;' /usr/share/dict/words
That says... "save the current line (word) in $m. Delete everything that is not a vowel. Print the original word if there are two things (i.e vowels) left."
Note that I am using the system dictionary as input for my tests.
You could do pretty much the same thing in awk.
If you're able to use an alternative to grep tr with wc works well:
words=/path/to/words.txt
while read -e word ; do
v=$(echo $word | tr -cd 'aeiou' | wc -c)
[[ ! $v -eq "2" ]] || echo $word >> output.txt
done < $words
This reads the original file line by line, counts the vowels & returns results with only 2 to output.txt.

extract a line from a file using csh

I am writing a csh script that will extract a line from a file xyz.
the xyz file contains a no. of lines of code and the line in which I am interested appears after 2-3 lines of the file.
I tried the following code
set product1 = `grep -e '<product_version_info.*/>' xyz`
I want it to be in a way so that as the script find out that line it should save that line in some variable as a string & terminate reading the file immediately ie. it should not read furthermore aftr extracting the line.
Please help !!
grep has an -m or --max-count flag that tells it to stop after a specified number of matches. Hopefully your version of grep supports it.
set product1 = `grep -m 1 -e '<product_version_info.*/>' xyz`
From the man page linked above:
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to
just after the last matching line before exiting, regardless of
the presence of trailing context lines. This enables a calling
process to resume a search. When grep stops after NUM matching
lines, it outputs any trailing context lines. When the -c or
--count option is also used, grep does not output a count
greater than NUM. When the -v or --invert-match option is also
used, grep stops after outputting NUM non-matching lines.
As an alternative, you can always the command below to just check the first few lines (since it always occurs in the first 2-3 lines):
set product1 = `head -3 xyz | grep -e '<product_version_info.*/>'`
I think you're asking to return the first matching line in the file. If so, one solution is to pipe the grep result to head
set product1 = `grep -e '<product_version_info.*/>' xyz | head -1`

Find stored procedures not referenced in source code

I am trying to clean up a legacy database by dropping all procedures that are not used by the application. Using grep, I have been able to determine that a single procedure does not occur in the source code. Is there a way to do this for all of the procedures at once?
UPDATE: While using -E "proc1|proc2" produces an output of all lines in all files which match either pattern, this is not very useful. The legacy database has 2000+ procedures.
I tried to use the -o option thinking that I could use its output as the pattern for an inverse search on the original pattern. However, I found that there is no output when you use the -o option with more than one pattern.
Any other ideas?
UPDATE: After further experimenting, I found that it is the combination of the -i and -o options which are preventing the output. Unfortunately, I need a case insensitive search in this context.
feed the list of stored procedures to egrep separated by "|"
or:
for stored_proc in $stored_procs
do
grep $stored_proc $source_file
done
I've had to do this in the past as well. Don't forget about any procs that may be called from other procs.
If you are using SQL Server you can use this:
SELECT name,
text
FROM sysobjects A
JOIN syscomments B
ON A.id = B.id
WHERE xtype = 'P'
AND text LIKE '%< sproc name >%'
I get output under the circumstances described in your edit:
$ echo "aaaproc1bbb" | grep -Eo 'proc1|proc2'
proc1
$ echo $?
0
$ echo "aaabbb" | grep -Eo 'proc1|proc2'
$ echo $?
1
The exit code shows if there was no match.
You might also find these options to grep useful (-L may be specific to GNU grep):
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines. (-c is specified by POSIX.)
-L, --files-without-match
Suppress normal output; instead print the name of each input
file from which no output would normally have been printed. The
scanning will stop on the first match.
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)
-q, --quiet, --silent
Quiet; do not write anything to standard output. Exit
immediately with zero status if any match is found, even if an
error was detected. Also see the -s or --no-messages option.
(-q is specified by POSIX.)
Sorry for quoting the man page at you, but sometimes it helps to screen things a bit.
Edit:
For a list of filenames that do not contain any of the procedures (case insensitive):
grep -EiL 'proc1|proc2' *
For a list of filenames that contain any of the procedures (case insensitive):
grep -Eil 'proc1|proc2' *
To list the files and show the match (case insensitive):
grep -Eio 'proc1|proc2' *
Start with your list of procedure names. For easy re-use later, sort them and make them lowercase, like so:
tr "[:upper:]" "[:lower:]" < list_of_procedures | sort > sorted_list_o_procs
... now you have a sorted list of the procedure names. Sounds like you're already using gnu grep, so you've got the -o option.
fgrep -o -i -f sorted_list_o_procs source1 source2 ... > list_of_used_procs
Note the use of fgrep: these aren't regexps, really, so why treat them as such. Hopefully you will also find that this magically corrects your output issues ;). Now you have an ugly list of the used procedures. Let's clean them up as we did the orginal list above.
tr "[:upper:]" "[:lower:]" < list_of_used_procs | sort -u > short_list
Now you have a short list of the used procedures. Let's find the ones in the original list that aren't in the short list.
fgrep -v -f short_list sorted_list_o_procs
... and there they are.

Resources