Overall total from du command - grep

I am trying to find out all directories and the overall size starting with pattern int-*
For this I am using the below command
$sudo ls -ld int-* | grep ^d | wc -l
3339
$ sudo ls -ld int-* | grep ^d | du -sh
204G .
Are my commands correct ? Any other command combination to gather the above information ?

Simply du -shc ./int-*/ should give the grand total of all directories under the pattern int-*. Add a trailing slash would do the trick for directories
AS
-s, report only the sum of the usage in the current directory, not for each directory therein contained
-h, is to obtain the results in human readable format

No, your commands are not okay (though the first is not outright wrong).
Both parse the output of ls which a dangerous thing to do as ls is supposed to produce human-readable output and the format might change in the future (and indeed it has several times over the years, and it differs across various Unix flavors). So generally, parsing the output ls is considered bad. See http://mywiki.wooledge.org/ParsingLs for details.
The second command also pipes this output into du while du isn't reading anything from stdin. It just ignores this kind of input and will do the same as it would have done if being called without the pipe: du -sh. This of course is not what you intended.
What you wanted can best be achieved in a proper fashion like this:
find -maxdepth 1 -type d -name 'int-*' -printf 'x\n' | wc -l
find -maxdepth 1 -type d -name 'int-*' -print0 | du --files0-from=- -c
Using the option --files0-from=- the command du does read NUL-separated file names from stdin. -c makes it print a total of all arguments.
Of course you can still add options -h for human-readable sizes (4G etc.) and -s if you do not want sizes for the subdirectories of your arguments.
If you want only the grand total, the best way is to chomp the output by piping it into tail -1.

Related

How to find which step in Dockerfile added some path?

I have a Docker image which contains a file, say /usr/bin/foo. What's the easiest way to find out which step of the Dockerfile added that path? (Which I thought was equivalent to the question, of which layer of the Docker image does that path come from?)
I wrote a script which prints out all the paths in the image, prefixed by layer ID. It appears to work, but is quite slow:
#!/bin/bash
die() { echo 1>&2 "ERROR: $*"; exit 1; }
dir=$(mktemp -d)
trap "rm -rf $dir" EXIT
img="$1"
[[ -n "$img" ]] || die "wrong arguments"
docker image save "$img" | (cd $dir && tar xf -) ||
die "failed extracting docker image $img"
(cd $dir && find . -name '*.tar' | while read f; do layer=$(echo $f | cut -d/ -f2); tar tf $f | sed -e "s/^/$layer:/"; done) ||
die "failed listing layers"
(It could be made faster if it didn't write anything to disk. The problem is while tar tf - prints the paths in the TAR, it doesn't do the same for the nested layer.tar files. I am thinking I could use the Python tarfile module - but surely somebody else out there has done this already?)
However, I don't know how to translate the layer ID it gives me to a step in the Docker image. I thought I'd correlate it with the layer IDs reported by docker inspect:
docker image inspect $IMAGE | jq -r '.[].RootFS.Layers[]' | nl
But the layer ID which my script reports as containing the path, I can't find in the output of the above command. (Is that a consequence of BuildKit???)
In the end, I gave up on this whole approach. Instead I just made some educated guesses as to which Dockerfile line was probably creating that path, tested each guess by commenting it out (and all the lines after it), and soon I found the answer. Still, there must be a better way, surely? Ideally, what I'd like is something like a --contains-path= option to docker image history – which doesn't exist, but maybe there is something else which does the equivalent?
While dlayer does not have any searching function built-in, it is straight-forward to implement by combining it with a Perl one-liner:
docker image save $IMAGE |
dlayer -n 999999 |
perl -ne 'chomp;$query=quotemeta("usr/bin/foo");$cmd=$_ if $_ =~ m/ [\$] /;print "$cmd\n\t$_\n" if m/ $query/;'
This will print something like:
13 MB $ /opt/bar/install.sh # buildkit
637 B usr/bin/foo
-n 999999 is to increase limit of number of file names output from the default 100, otherwise the path will only be found if it is in the first 100 from that layer.
(I submitted a PR to add a built-in search function to dlayer, which removes the need for this one-line Perl script.)

Why is xargs' exit code different based on the presence of "-I" option?

After reading the xargs man page, I am unable to understand the difference in exit codes from the following xargs invocations.
(The original purpose was to combine find and grep to check if an expressions exists in ALL the given files when I came across this behaviour)
To reproduce:
(use >>! if using zsh to force creation of file)
# Create the input files.
echo "a" >> 1.txt
echo "ab" >> 2.txt
# The end goal is to check for a pattern (in this case simply 'b') inside
# ALL the files returned by a find search.
find . -name "1.txt" -o -name "2.txt" | xargs -I {} grep -q "b" {}
echo $?
123 # Works as expected since 'b' is not present in 1.txt
find . -name "1.txt" -o -name "2.txt" | xargs grep -q "b"
echo $?
0 # Am more puzzled by why the behaviour is inconsistent
The EXIT_STATUS section on the man page says:
xargs exits with the following status:
0 if it succeeds
123 if any invocation of the command exited with status 1-125
124 if the command exited with status 255
125 if the command is killed by a signal
126 if the command cannot be run
127 if the command is not found
1 if some other error occurred.
I would have thought, that 123 if any invocation of the command exited with status 1-125 should apply irrespective of whether or not -I is used ?
Could you share any insights to explain this conundrum please?
Here is evidence of the effect of -I option with xargs with the help of a wrapper script which shows the number of invocations:
cat ./grep.sh
#/bin/bash
echo "I am being invoked at $(date +%Y%m%d_%H-%M-%S)"
grep $#
(the actual command being invoked, in this case grep doesn't really matter)
Now execute the same commands as in the question using the wrapper script instead:
❯ find . -name "1.txt" -o -name "2.txt" | xargs -I {} ./grep.sh -q "b" {}
I am being invoked at 20190410_09-46-29
I am being invoked at 20190410_09-46-30
❯ find . -name "1.txt" -o -name "2.txt" | xargs ./grep.sh -q "b"
I am being invoked at 20190410_09-46-53
I have just discovered a comment on the answer of a similar question that answers this question (complete credit to https://superuser.com/users/49184/daniel-andersson for his wisdom):
https://superuser.com/questions/557203/xargs-i-behaviour#comment678705_557230
Also, unquoted blanks do not terminate input items; instead the separator is the newline character. — this is central to understanding the behavior. Without -I, xargs only sees the input as a single field, since newline is not a field separator. With -I, suddenly newline is a field separator, and thus xargs sees three fields (that it iterates over). That is a real subtle point, but is explained in the man page quoted.
-I replace-str
Replace occurrences of replace-str in the initial-arguments
with names read from standard input. Also, unquoted blanks do
not terminate input items; instead the separator is the
newline character. Implies -x and -L 1.
Based on that,
find . -name "1.txt" -o -name "2.txt"
#returns
# ./1.txt
# ./2.txt
xargs -I {} grep -q "b" {}
# interprets the above as two separate lines since,
# with -I option the newline is now a *field separator*.
# So this results in TWO invocations of grep and since one of them fails,
# the overall output is 123 as documented in the EXIT_STATUS section
xargs grep -q "b"
# interprets the above as a single input field,
# so a single grep invocation which returns a successful exit code of 0 since the pattern was found in one of the files.

Simple way to order grep's result by time (reverse)?

I often have to look for specific strings in a big set of log files with grep. And I get lots of results, on what I must scroll a lot.
Today, the results of grep list the results in alphabetical order. I would like to have my grep results reversed ordered by time, like a ls -ltr would do.
I know I could take the result of ls -ltr and grep file by file. I do it like this:
ls -ltr ${my_log_dir}\* | awk '{print $9}' |xargs grep ${my_pattern}
But I wonder: Is there a simpler way?
PS: I'm using ksh on AIX.
The solution I found (thanks to Fedorqui) was to simply use ls -tr. It assumes the results are passed in the right order through the | to xargs allowing then to do the grep.
My misconception was that since when I us ls, the results arrive not as a single column list but as a multiple column list, it couldn't work as an input for xargs.
Here is the simplest solution to date, since it avoids any awk parsing:
ls -tr ${my_log_dir}\* | xargs grep ${my_pattern}
I checked, and every result of the ls -t are passed to xargs even though they look not as I expected they would enter easily in it:
srv:/papi/tata $ ls -t
addl ieet rrri
ooij lllr sss
srv:/papi/tata $ ls -t |xargs -I{} echo {}
addl
ieet
rrri
ooij
lllr
sss
This will work too use find command:-
find -type f -print0 | xargs -r0 stat -c %y\ %n | sort -r | awk '{print $4}' | sed "s|^\./||"
-print0 in find to preserve files having special characters(whitespaces, tabs)
Print file status (stat with %y (Time of last modification) and %n (%n File name) with output having new-separated (-c)
Reverse sort the output from previous command. (-r for reverse)
awk '{print $4}' printing only the file-name (can be optimized as needed)
Removing the leading ./ from the file-names.

Listing non matching entries using 'grep -f'

The following command gives me a list of matching expressions:
grep -f /tmp/list Filename* > /tmp/output
The list file is then parsed and used to search Filename* for the parsed string. The results are then saved to output.
How would I output the parsed string from list in the case where there is no match in Filename*?
Contents of the list file could be:
ABC
BLA
ZZZ
HJK
Example Files:
Filename1:5,ABC,123
Filename2:5,ZZZ,342
Result of Running Command:
BLA
HJK
Stack overflow question 2480584 looks like it may be relevant, through the use of an if statement. However I'm not sure how to output the parsed string to the output file. Would require some type of read line?
TIA,
Mic
Obviously, grep -f list Filename* gives all matches of patterns from the file list in the files specified by Filename*, i.e.,
Filename1:5,ABC,123
Filename2:5,ZZZ,342
in your example.
By adding the -o (only print matching expression) and -h (do not print filename) flags, we can turn this into:
ABC
ZZZ
Now you want all patterns from list that are not contained in this list, which can be achieved by
grep -f list Filename* -o -h | grep -f /dev/stdin -v list
where the second grep takes it's patterns from the output of the first and by using the -v flag gives all the lines of file list that do not match those patterns.
This makes it:
$ grep -v "$(cat Filename* | cut -d, -f2)" /tmp/list
BLA
HJK
Explanation
$ cat Filename* | cut -d, -f2
ABC
ZZZ
And then grep -v looks for the inverse matching.

use cat and grep to look for a bunch of files and give me info while retaining file name

so I am running the following command
cat /Users/sars/logs/testlogs/2012-04-02*/*/top |grep -H "httpd"
I'm using the * because there are a bunch of directories (which is actually the information I am looking for) and looking for the phrase httpd in the top output
But when I do this I get (standard input): 4951 root 1 96 0 14052K 6844K select 2 0:12 0.00% httpd
instead of the filename
how do I go through these directories look in the top file and find the lines with httpd in them while maintaining the name and path of the file it is found in?
grep can take the filenames as arguments:
grep -H "httpd" /Users/sars/logs/testlogs/2012-04-02*/*/top
ack is a good tool for this sort of thing, but is nonstandard. To do it with grep, you probably want to use find:
find /Users/sars/logs/testlogs -type f \
-wholename '/Users/sars/logs/testlogs/2012-04-02*/*/top/*' \
-exec grep -H httpd {} \;

Resources