Unix. Parse file with full paths to SHA256 checksums files. Run command in each path/file - parsing

I have a file file.txt with filenames ending with *.sha256, including the full paths of each file. This is a toy example:
file.txt:
/path/a/9b/x3.sha256
/path/7c/7j/y2.vcf.gz.sha256
/path/e/g/7z.sha256
Each line has a different path/file. The *.sha256 files have checksums.
I want to run the command "sha256sum -c" on each of these *.sha256 files and write the output to an output_file.txt. However, this command only accepts the name of the .sha256 file, not the name including its full path. I have tried the following:
while read in; do
sha256sum -c "$in" >> output_file.txt
done < file.txt
but I get:
"sha256sum: WARNING: 1 listed file could not be read"
which is due to the path included in the command.
Any suggestion is welcome

#!/bin/bash
while read in
do
thedir=$(dirname "$in")
thefile=$(basename "$in")
cd "$thedir"
sha256sum -c "$thefile" >>output_file.txt
done < file.txt
Modify your code to extract the directory and file parts of your in variable.

Related

How to copy multiple files in directory and move each into their correct directory

Unix shell ksh
I created a file list and am currently trying to copy each file to their correct path.
(mylist)
-1111
-2222
-3333
-4444
-5555
current directory
/sample/dir/unknown/
-1111fileneeded.txt
-2222fileneeded.txt
-3333fileneeded.txt
-4444fileneeded.txt
-5555fileneeded.txt
-6666dontneed.txt
-7777dontneed.txt
-8888dontneed.txt
...etc
The first 4 characters of each file matches with their correct path to where they need to go.
/sample/dir/1111/
/sample/dir/2222/
/sample/dir/3333/
/sample/dir/4444/
So here is what I currently have..
for i in `cat mylist`
do echo "$i"
find /sample/dir/unknown/mylist*
this is where I am kinda stuck and trying to figure out what needs to be done to have each file moved into their correct directory.
This should work
#!/bin/ksh
while IFS=\| read -r line; do
dir=`echo $line | cut -c 2-5`
mv "$line /sample/$dir/$line"
done > filelist.txt
IFS is escape special char, just in case.
cut -c 2-5 is taking all char from 2 to 5 (because there is a dash at the start of your file name)
Let me know if there is something else you don't understand.

command line to convert all .docx in a directory (and subdirectories) to text file and write new files

I would like to convert all .docx files in a directory (and subdirectories) to text files from the command line (so I can use grep after on these files). I found this
unzip -p tutu.docx word/document.xml | sed -e 's/<\/w:p>/\n/g; s/<[^>]\{1,\}>//g; s/[^[:print:]\n]\{1,\}//g'
here which works well but it sends the file in the terminal. I would like to write the new text file (.txt for instance) in the same directory as the .docx file. And I would like a script to do this recursively.
I have this, using antiword, that do what I want for .doc files but it doesn't work for .docx files.
find . -name '*.doc' | while read i; do antiword -i 1 "${i}" >"${i/doc/txt}"; done
I tried to mix both but without success... A command line that would do both at the same time would be appreciated!
Thank you
You can use pandoc to convert docx files. It doesn't support .doc files so you will need both pandoc and antiword.
Reusing your while loop:
find . -name '*.docx' | while read i; do pandoc --from docx --to plain "${i}" >"${i/docx/txt}"; done
The following script..
converts all docx files in the directory where you run it, recursively (adapt . in find . to your wished starting point)
writes the txt files to where it found the docx file
Bash script:
find . -name "*.docx" | while read file; do
unzip -p $file word/document.xml |
sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g' > "${file/docx/txt}"
done
Afterwards you can run the grep like this:
grep -r "some text" --include "*.txt" .

How could I untar all .tar files in a directory to folders based on filename of each .tar?

I could do this for .zip files in the folder using the command below:
for f in "!"; do unzip -d "${f%*.zip}" "$f"; done
The above command extracts all .zip files in a given folder to subfolders, having content and name of respective .zip files.
But I couldn't find a command that would do the same for .tar files. Please help.
Btw, I am trying to do this on a remote server using WinSCP/putty. So, I cannot use a GUI software. I need a command, thus the question.
After a bit of fiddling I came up with for f in $(find -maxdepth 1 | grep .tar); do mkdir ${f%.tar}; tar -xaf $f -C ${f%.tar} ; done appears to work, so long as the file name does not contain any spaces. I assume you wanted the directory from foo.tar to be named foo (no file extension). If you want the directory to be named foo.tar (with file extension) then try using for f in $(find -maxdepth 1 | grep .tar); do mkdir $f ; tar -xaf $f -C $f ; done.
IIRC, the remote access client Cyberduck can handle compressed files in a GUI - so you can try that if you're fine with a GUI solution.

extract a file from xz file

I have a huge file file.tar.xz containing many smaller text files with a similar structure. I want to quickly examine a file out of the compressed file and have a glimpse of files content structure. I don't have information about names of the files within the compressed file. Is there anyway to extract a single file out given the above the above scenario?
Thank you.
EDIT: I don't want to tar -xvf file.tar.xz.
Based on the discussion in the comments, I tried the following which worked for me. It might not be the most optimal solution, the regex might need some improvement, but you'll get the idea.
I first created a demo archive:
cd /tmp
mkdir demo
for i in {1..100}; do echo $i > "demo/$i.txt"; done
cd demo && tar cfJ ../demo.tar.xz * && cd ..
demo.tar.xz now contains 100 txt files.
The following lists the contents of the archive, selects the first file and stores the path within the archive into the variable firstfile:
firstfile=`tar -tvf demo.tar.xz | grep -Po -m1 "(?<=:[0-9]{2} ).*$"`
echo $firstfile will output 1.txt.
You can now extract this single file from the archive:
tar xf demo.tar.xz $firstfile

Exclude common subdirectories when creating a tarball

I'm creating a tarball of a large codebase managed in ClearCase. Every directory has a sub-directory named ".CC". I'd like to exclude these from my tarball.
I've found Excluding directory when creating a .tar.gz file, but excluding that would appear to require passing each and every .CC directory on the commndline. This is impractical in my case.
Is there a way to exclude directories that meet a particular pattern?
EDIT:
I am not asking how to exclude a specific finite list of directories. I am asking how to exclude all directories that end in a particular pattern.
Instead of manually typing --exclude 'root/a/.CC' --exclude 'root/b/.CC' ... you can type $(find root -type d -name .CC -exec echo "--exclude \'{}\'" \;|xargs)
You can use whatever patterns find supports, or even use something like grep inbetween find and xargs.
The following bash script should do the trick. It uses the answer given by #Marcus Sundman.
#!/bin/bash
echo -n "Please enter the name of the tar file you wish to create with out extension "
read nam
echo -n "Please enter the path to the directories to tar "
read pathin
echo tar -czvf $nam.tar.gz
excludes=`find $pathin -iname "*.CC" -exec echo "--exclude \'{}\'" \;|xargs`
echo $pathin
echo tar -czvf $nam.tar.gz $excludes $pathin
This will print out the command you need and you can just copy and paste it back in. There is probably a more elegant way to provide it directly to the command line.
*.CC could be exchanged for any other common extension and this should still work.

Resources