How do I invoke --stream in the middle of a jq query? - stream

I have a file of newline-separated JSON lists, the total of which I would like to count. I can do this with two invocations of jq as such:
cat file.nsj | jq -s ".[] | length" | jq -s "add"
But I would prefer to do it in a single jq invocation. Is this possible?

If your goal is just to count the number of objects in the file full of lists, you could do this:
$ jq -n 'reduce inputs as $i (0; . + ($i | length))' file.nsj

Here's a variation of Jeff's solution which uses -n, inputs and length and add.
jq -n '[ inputs | length ] | add' file.nsj

Related

How to grep lines non-repeatedly for same command?

I have a space-separated file that looks like this:
$ cat in_file
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 Chal_sti_synt_C
GCF_000046845.1_ASM4684v1_protein.faa WP_004927566.1 Chal_sti_synt_C
GCF_000046845.1_ASM4684v1_protein.faa WP_004919950.1 FAD_binding_3
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 FAD_binding_3
I am using the following shell script utilizing grep to search for strings:
$ cat search_script.sh
grep "GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1" Pfam_anntn_temp.txt
grep "GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1" Pfam_anntn_temp.txt
The problem is that I want each grep command to return only the first instance of the string it finds exclusive of the previous identical grep command's output.
I need an output which would look like this:
$ cat out_file
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 Chal_sti_synt_C
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 FAD_binding_3
in which line 1 is exclusively the output of the first grep command and line 2 is exclusively the output of the second grep command. How do I do it?
P.S. I am running this on a big file (>125,000 lines). So, search_script.sh is mostly composed of unique grep commands. It is the identical commands' execution that is messing up my downstream analysis.
I'm assuming you are generating search_script.sh automatically from the contents of in_file. If you can count how many times you'll repeat the same grep command you can just use grep once and use head, for example if you know you'll be using it 2 times:
grep "foo" bar.txt | head -2
Will output the first 2 occurrences of "foo" in bar.txt.
If you have to do the grep commands separately, for example if you have other code in between the grep commands, you can mix head and tail:
grep "foo" bar.txt | head -1 | tail -1
Some other commands...
grep "foo" bar.txt | head -2 | tail -1
head -n displays the first n lines of the input
tail -n displays the last n lines of the input
If you really MUST always use the same command, but ensure that the outputs always differ, the only way I can think of to achieve this is using temporary files and a complex sequence of commands:
cat foo.bar.txt.tmp 2>&1 | xargs -I xx echo "| grep -v \\'xx\\' " | tr '\n' ' ' | xargs -I xx sh -c "grep 'foo' bar.txt xx | head -1 | tee -a foo.bar.txt.tmp"
So to explain this command, given foo as a search string and bar.txt as the filename, then foo.bar.txt.tmp is a unique name for a temporary file. The temporary file will hold the strings that have already been output:
cat foo.bar.txt.tmp 2>&1 : outputs the contents of the temporary file. If none is present, will output an error message to stdout, (important because if the output was empty the rest of the command wouldn't work.)
xargs -I xx echo "| grep -v \\'xx\\' " adds | grep -v to the start of each line in the temporary file, grep -v something excludes lines that include something.
tr '\n' ' ' replaces newlines with spaces, to have on a single string a sequence of grep -vs.
xargs -I xx sh -c "grep 'foo' bar.txt xx | head -1 | tee -a foo.bar.txt.tmp" runs a new command, grep 'foo' bar.txt xx | head -1 | tee -a foo.bar.txt.tmp, replacing xx with the previous output. xx should be the sequence of grep -vs that exclude previous outputs.
head -1 makes sure only one line is output at a time
tee -a foo.bar.txt.tmp appends the new output to the temporary file.
Just be sure to clear the temporary files, rm *.tmp, at the end of your script.
If I am getting question right and you want to remove duplicates based on last field of each line then try following(this should be easy task for awk).
awk '!a[$NF]++' Input_file

Combine netstat command and geoiplookup

how can I combine the following command:
netstat -atun | awk '{print $ 5}' | cut-d: f1 | -e sed '/ ^ $ / d' | sort | uniq-c | sort-n
and "geoiplookup" listing something like "Con. Number, IP, Country"
I am using this lib:
http://kbeezie.com/geoiplookup-command-line/
Thank you for your help!
best regards
You should be able to get it with something like this:
netstat -an -f inet | awk '{print $ 5}' | sed -e '/^\*\.\*$/d' | awk 'sub(/\.[0-9]+$/,"")' | uniq | sort -n | xargs -n 1 geoiplookup { } | sort | uniq -c | sort -n | sed -r 's/ GeoIP Country Edition://g'
netstat -an -f inet - shows all network related data structures with network addresses as numbers and pulls the inet address family
awk '{print $ 5}' - takes that input and presents only the ip address with port from the prior.
sed -e '/^\*\.\*$/d' - strips out all of the . lines
awk 'sub(/\.[0-9]+$/,"")' - strips the port number leaving the ip address alone
uniq - gets rid of the duplicate ips
sort -n - performs a numeric sort (not necessary)
xargs -n 1 geoiplookup { } - takes the first input an performs the lookup for the country
sort - sorts based on country name
uniq -c - groups the country names with a count
sort -n - organizes the countries based on the count
sed -r 's/ GeoIP Country Edition://g' - Strips the phrasing "GeoIP Country Edition:"
This has little to do with brute force, other than telling you which country the connections are coming from.

view content of files from grep -L

I use grep -L to get a list of files that do not contain a certain string. How can I see the content of those files? Just like:
grep -L "pattern" | cat
You can use xargs:
grep -L "pattern" | xargs cat
As read in man xargs --> build and execute command lines from standard input. So it will cat to those file names that grep -L returns.
You can use cat and use the output of grep -L...
cat $(grep -L "pattern" *.files )

How to grep and execute a command (for every match)

How to grep in one file and execute for every match a command?
File:
foo
bar
42
foo
bar
I want to execute to execute for example date for every match on foo.
Following try doesn't work:
grep file foo | date %s.%N
How to do that?
grep file foo | while read line ; do echo "$line" | date %s.%N ; done
More readably in a script:
grep file foo | while read line
do
echo "$line" | date %s.%N
done
For each line of input, read will put the value into the variable $line, and the while statement will execute the loop body between do and done. Since the value is now in a variable and not stdin, I've used echo to push it back into stdin, but you could just do date %s.%N "$line", assuming date works that way.
Avoid using for line in `grep file foo` which is similar, because for always breaks on spaces and this becomes a nightmare for reading lists of files:
find . -iname "*blah*.dat" | while read filename; do ....
would fail with for.
What you really need is a xargs command. http://en.wikipedia.org/wiki/Xargs
grep file foo | xargs date %s.%N
example of matching some files and converting matches to the full windows path in Cygwin environment
$ find $(pwd) -type f -exec ls -1 {} \; | grep '\(_en\|_es\|_zh\)\.\(path\)$' | xargs cygpath -w
grep command_string file | sh -
There is an interesting command in linux for that: xargs, It allows You to use the output from previous command(grep, ls, find, etc.) as the input for a custom execution but with several options that allows You to even execute the custom command in parallel. Below some examples:
Based in your question, here is how to print the date with format "%s.%N" for each "foo" match in file.txt:
grep "foo" file.txt | xargs -I {} date +%s.%N
A more interesting use is creating a file for each match, but in this case if matches are identical the file will be override:
grep "foo" file.txt | xargs -I {} touch {}
If You want to concatenate a custom date to the file created
grep "foo" file.txt | xargs -I {} touch "{}`date +%s.%N`"
Imagine the matches are file names and You want to make a backup of them:
grep "foo" file.txt | xargs -I {} cp {} "{}.backup"
And finally for xargs using the custom date in the backupName
grep "foo" file.txt | xargs -I {} cp {} "{}`date +%s.%N`"
For more info about options like parallel execution of xargs visit: https://en.wikipedia.org/wiki/Xargs and for date formats: https://www.thegeekstuff.com/2013/05/date-command-examples/
Extra I have found also a normal for command useful in this scenarios It is simpler but less versatile below are the equivalent for above examples:
for i in `grep "foo" test.txt`; do date +%s.%N; done
for i in `grep "foo" test.txt`; do touch ${i}; done
for i in `grep "foo" test.txt`; do touch "${i}`date +%s.%N`"; done
for i in `grep "foo" test.txt`; do cp ${i} "${i}.backup2"; done
for i in `grep "foo" test.txt`; do cp ${i} "${i}.backup2`date +%s.%N`"; done
Have Fun!!!
grep may need --line-buffered option to emit each matching line when it matches it, otherwise it buffers up to 4K byte before printing match lines, which defeats the goal here, e.g.
tail -f source | grep --line-buffered "expression | xargs ...
grep search_string files_to_search | sh

Determining word count using grep (in cases where there are multiple words in a line)

Is it possible to determine the number of times a particular word appears using grep
I tried the "-c" option but this returns the number of matching lines the particular word appears in
For example if I have a file with
some words and matchingWord and matchingWord
and then another matchingWord
running grep on this file for "matchingWord" with the "-c" option will only return 2 ...
note: this is the grep command line utility on a standard unix os
grep -o string file will return all matching occurrences of string. You can then do grep -o string file | wc -l to get the count you're looking for.
I think that using grep -i -o string file | wc -l should give you the correct output, what happens when you do grep -i -o string file on the file?
You can simply count words (-w) with wc program:
> echo "foo foo" | grep -o "foo" | wc -w
> 2

Resources