Exclude common subdirectories when creating a tarball - tar

I'm creating a tarball of a large codebase managed in ClearCase. Every directory has a sub-directory named ".CC". I'd like to exclude these from my tarball.
I've found Excluding directory when creating a .tar.gz file, but excluding that would appear to require passing each and every .CC directory on the commndline. This is impractical in my case.
Is there a way to exclude directories that meet a particular pattern?
EDIT:
I am not asking how to exclude a specific finite list of directories. I am asking how to exclude all directories that end in a particular pattern.

Instead of manually typing --exclude 'root/a/.CC' --exclude 'root/b/.CC' ... you can type $(find root -type d -name .CC -exec echo "--exclude \'{}\'" \;|xargs)
You can use whatever patterns find supports, or even use something like grep inbetween find and xargs.

The following bash script should do the trick. It uses the answer given by #Marcus Sundman.
#!/bin/bash
echo -n "Please enter the name of the tar file you wish to create with out extension "
read nam
echo -n "Please enter the path to the directories to tar "
read pathin
echo tar -czvf $nam.tar.gz
excludes=`find $pathin -iname "*.CC" -exec echo "--exclude \'{}\'" \;|xargs`
echo $pathin
echo tar -czvf $nam.tar.gz $excludes $pathin
This will print out the command you need and you can just copy and paste it back in. There is probably a more elegant way to provide it directly to the command line.
*.CC could be exchanged for any other common extension and this should still work.

Related

Unix. Parse file with full paths to SHA256 checksums files. Run command in each path/file

I have a file file.txt with filenames ending with *.sha256, including the full paths of each file. This is a toy example:
file.txt:
/path/a/9b/x3.sha256
/path/7c/7j/y2.vcf.gz.sha256
/path/e/g/7z.sha256
Each line has a different path/file. The *.sha256 files have checksums.
I want to run the command "sha256sum -c" on each of these *.sha256 files and write the output to an output_file.txt. However, this command only accepts the name of the .sha256 file, not the name including its full path. I have tried the following:
while read in; do
sha256sum -c "$in" >> output_file.txt
done < file.txt
but I get:
"sha256sum: WARNING: 1 listed file could not be read"
which is due to the path included in the command.
Any suggestion is welcome
#!/bin/bash
while read in
do
thedir=$(dirname "$in")
thefile=$(basename "$in")
cd "$thedir"
sha256sum -c "$thefile" >>output_file.txt
done < file.txt
Modify your code to extract the directory and file parts of your in variable.

How do I use zgrep to look for content in archived files matching a given filename pattern?

Suppose I have two tar.gz files
a1.tar.gz
a2.tar.gz
and each archive contains many files, including a file called
target.txt
How do I search for BLAH in target.txt in both of these archives using zgrep without searching all of the other files in each archive?
If I try
zgrep -a BLAH *.tar.gz
then that searches all files in each archive, and if I try
zgrep --include=target.txt -a BLAH *.tar.gz
then I get
zgrep: --include=target.txt: option not supported
You can use zgrep, however this command will not locate the file you are looking for as it searches the entire tar-formatted file for matches. This will report which tar file has a match for BLAH but this does not limit searching tar files containing the file target.txt.
There is an open source tool called ugrep to search archives (zip, tar, pax, cpio, jar) and tar.gz tarballs. Use option -z and -g target.txt to search for BLAH in target.txt in all of the tar files found recursively:
ugrep -z "BLAH" -g target.txt
To search the working directory only without recursing deeper:
ugrep -z "BLAH" -g target.txt .
Note that option -g takes a glob. Use a quoted glob like -g "*.txt" to match all .txt files.

Dockerignore: allow to add only specific extension like *.json from any subfolder

I have a .dockerignore file and I'm trying to allow Docker to upload only *.json files but from any of subfolders.
For example, for the next files structure:
public/readme.md
public/subfolder/a.json
public/subfolder/b.json
public/other/c.json
public/other/file.txt
I'm expecting to see only json files in the image:
public/subfolder/a.json
public/subfolder/b.json
public/other/c.json
Of course they must be located in the same directories as in original source.
I tried several ways but didn't succeed.
UP: I don't know how many subfolders will be created in the public/ directory and how deep will be the directories structure.
I think you can achieve what you want by relying on one such .dockerignore:
public/*
!public/subfolder
public/subfolder/*
!public/other
public/other/*
!**/*.json
The tricky thing is that the first line of this file is public/* but not public nor * (otherwise the !... subsequent lines won't work).
Note also that you may want to automate the generation of one such .dockerignore, to cope with possible tree structure changes.
For example:
gen-dockerignore.sh
#!/usr/bin/env bash
{ echo '*' ; # header of the .dockerignore - to be changed if need be
find public -type d -exec echo -en "!{}\n{}/*\n" \; ;
echo '!**/*.json' ; } > .dockerignore
$ ./gen-dockerignore.sh would output the following file:
.dockerignore
*
!public
public/*
!public/other
public/other/*
!public/subfolder
public/subfolder/*
!**/*.json

How to use grep to search only in a specific file types?

I have a lot of files and I want to find where is MYVAR.
I'm sure it's in one of .yml files but I can't find in the grep manual how to specify the filetype.
grep -rn --include=*.yml "MYVAR" your_directory
please note that grep is case sensitive by default (pass -i to tell to ignore case), and accepts Regular Expressions as well as strings.
You don't give grep a filetype, just a list of files. Your shell can expand a pattern to give grep the correct list of files, though:
$ grep MYVAR *.yml
If your .yml files aren't all in one directory, it may be easier to up the ante and use find:
$ find -name '*.yml' -exec grep MYVAR {} \+
This will find, from the current directory and recursively deeper, any files ending with .yml. It then substitutes that list of files into the pair of braces {}. The trailing \+ is just a special find delimiter to say the -exec switch has finished. The result is matching a list of files and handing them to grep.
If all your .yml files are in one directory, then cd to that directory, and then ...
grep MYWAR *.yml
If all your .yml files are in multiple directories, then cd to the top of those directories, and then ...
grep MYWAR `find . -name \*.yml`
If you don't know the top of those directories where your .yml files are located and want to search the whole system ...
grep MYWAR `find / -name \*.yml`
The last option may require root privileges to read through all directories.
The ` character above is the one that is located along with the ~ key on the keyboard.
find . -name \*.yml -exec grep -Hn MYVAR {} \;

grep in all directories

I have a directory named XYZ which has directories ABC, DEF, GHI inside it. I want to search for a pattern 'writeText' in all *.c in all directories (i.e XYZ, XYZ/ABC, XYZ/DEF and XYZ/GHI)
What grep command can I use?
Also if I want to search only in XYZ, XYZ/ABC, XYZ/GHI and not XYZ/DEF, what grep command can I use?
Thank you!
grep -R --include="*.c" --exclude-dir={DEF} writeFile /path/to/XYZ
-R means recursive, so it will go into subdirectories of the directory you're grepping through
--include="*.c" means "look for files ending in .c"
--exclude-dir={DEF} means "exclude directories named DEF. If you want to exclude multiple directories, do this: --exclude-dir={DEF,GBA,XYZ}
writeFile is the pattern you're grepping for
/path/to/XYZ is the path to the directory you want to grep through.
Note that these flags apply to GNU grep, might be different if you're using BSD/SysV/AIX grep. If you're using Linux/GNU grep utils you should be fine.
You can use the following command to answer at least the first part of your question.
find . -name *.c | xargs grep "writeText"

Resources