How could I untar all .tar files in a directory to folders based on filename of each .tar? - tar

I could do this for .zip files in the folder using the command below:
for f in "!"; do unzip -d "${f%*.zip}" "$f"; done
The above command extracts all .zip files in a given folder to subfolders, having content and name of respective .zip files.
But I couldn't find a command that would do the same for .tar files. Please help.
Btw, I am trying to do this on a remote server using WinSCP/putty. So, I cannot use a GUI software. I need a command, thus the question.

After a bit of fiddling I came up with for f in $(find -maxdepth 1 | grep .tar); do mkdir ${f%.tar}; tar -xaf $f -C ${f%.tar} ; done appears to work, so long as the file name does not contain any spaces. I assume you wanted the directory from foo.tar to be named foo (no file extension). If you want the directory to be named foo.tar (with file extension) then try using for f in $(find -maxdepth 1 | grep .tar); do mkdir $f ; tar -xaf $f -C $f ; done.
IIRC, the remote access client Cyberduck can handle compressed files in a GUI - so you can try that if you're fine with a GUI solution.

Related

Unix. Parse file with full paths to SHA256 checksums files. Run command in each path/file

I have a file file.txt with filenames ending with *.sha256, including the full paths of each file. This is a toy example:
file.txt:
/path/a/9b/x3.sha256
/path/7c/7j/y2.vcf.gz.sha256
/path/e/g/7z.sha256
Each line has a different path/file. The *.sha256 files have checksums.
I want to run the command "sha256sum -c" on each of these *.sha256 files and write the output to an output_file.txt. However, this command only accepts the name of the .sha256 file, not the name including its full path. I have tried the following:
while read in; do
sha256sum -c "$in" >> output_file.txt
done < file.txt
but I get:
"sha256sum: WARNING: 1 listed file could not be read"
which is due to the path included in the command.
Any suggestion is welcome
#!/bin/bash
while read in
do
thedir=$(dirname "$in")
thefile=$(basename "$in")
cd "$thedir"
sha256sum -c "$thefile" >>output_file.txt
done < file.txt
Modify your code to extract the directory and file parts of your in variable.

command line to convert all .docx in a directory (and subdirectories) to text file and write new files

I would like to convert all .docx files in a directory (and subdirectories) to text files from the command line (so I can use grep after on these files). I found this
unzip -p tutu.docx word/document.xml | sed -e 's/<\/w:p>/\n/g; s/<[^>]\{1,\}>//g; s/[^[:print:]\n]\{1,\}//g'
here which works well but it sends the file in the terminal. I would like to write the new text file (.txt for instance) in the same directory as the .docx file. And I would like a script to do this recursively.
I have this, using antiword, that do what I want for .doc files but it doesn't work for .docx files.
find . -name '*.doc' | while read i; do antiword -i 1 "${i}" >"${i/doc/txt}"; done
I tried to mix both but without success... A command line that would do both at the same time would be appreciated!
Thank you
You can use pandoc to convert docx files. It doesn't support .doc files so you will need both pandoc and antiword.
Reusing your while loop:
find . -name '*.docx' | while read i; do pandoc --from docx --to plain "${i}" >"${i/docx/txt}"; done
The following script..
converts all docx files in the directory where you run it, recursively (adapt . in find . to your wished starting point)
writes the txt files to where it found the docx file
Bash script:
find . -name "*.docx" | while read file; do
unzip -p $file word/document.xml |
sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g' > "${file/docx/txt}"
done
Afterwards you can run the grep like this:
grep -r "some text" --include "*.txt" .

extract a file from xz file

I have a huge file file.tar.xz containing many smaller text files with a similar structure. I want to quickly examine a file out of the compressed file and have a glimpse of files content structure. I don't have information about names of the files within the compressed file. Is there anyway to extract a single file out given the above the above scenario?
Thank you.
EDIT: I don't want to tar -xvf file.tar.xz.
Based on the discussion in the comments, I tried the following which worked for me. It might not be the most optimal solution, the regex might need some improvement, but you'll get the idea.
I first created a demo archive:
cd /tmp
mkdir demo
for i in {1..100}; do echo $i > "demo/$i.txt"; done
cd demo && tar cfJ ../demo.tar.xz * && cd ..
demo.tar.xz now contains 100 txt files.
The following lists the contents of the archive, selects the first file and stores the path within the archive into the variable firstfile:
firstfile=`tar -tvf demo.tar.xz | grep -Po -m1 "(?<=:[0-9]{2} ).*$"`
echo $firstfile will output 1.txt.
You can now extract this single file from the archive:
tar xf demo.tar.xz $firstfile

Extract text files in each subfolder and join them with the subfolder name

I have compressed text files in the following folder structure:
~/A/1/1.faa.tgz #each tgz file has dozens of faa text files
~/A/2/2.faa.tgz
~/A/3/3.faa.tgz
I would like to extract the faa files (text) from each tgz file and then join them using the subfoldername (1,2 and 3) to create a single text file for each subfolder.
My attempt was the following, but the files were extracted in the folder where I ran the script:
#!/bin/bash
for FILE in ~/A/*/*.faa.tgz; do
tar -vzxf "$FILE"
done
After extracting the faa files I would use "cat" to join them (for example, using cat *.faa > .txt.
Thanks in advance.
To extract:
#!/bin/bash
for FILE in ~/A/*/*.faa.tgz; do
echo "$FILE"
mkdir "`basename "$FILE"`"
tar -vzxf "$FILE" -C "`basename "$FILE"`"
done
To join:
#!/bin/bash
for dir in ~/A/*/; do
(
cd "$dir"
file=( *.faa )
cat "${file[#]}" > "${PWD##*/}.txt"
)
done

Unpack tar.gz folder to part of filename

I have a file dagens_130325.tar.gz containing the folder dagens. In one folder I have hundreds of these daily info. I would like to unpack dagens_130325.tar.gz/dagens to 130325 with all the files inside. Then 130326 etc.
Is there a way to do it?
Not sure this is the right stack where to ask this kind of question, however try with
tar -zxvf dagens_130325.tar.gz -C /tmp/130325 dagens
This way, the folder dagens for the archive dagens_130325.tar.gz is going to be extracted into /tmp/130325. However, note that the target folder must exist, otherwise the command will fail
So, supposedly you have 4 archives in the form dagens_1.tar.gz, dagens_2.tar.gz, ..., you can write an extract.sh file containing
#!/bin/bash
for i in {1..4}
do
mkdir /tmp/$i
FILE="dagens_$i.tar.gz"
tar -zxvf $FILE -C /tmp/$i dagens
done
Having this file the execute permission, being in the same folder as your archives and executing it should produced the result you asked.
This was the solution I came up with in the end
#!/bin/bash
search_dir=/yourdir/with/tar.gz
for entry in "$search_dir"/*.tar.gz
do
substring=$(basename "$entry")
echo $substring
sub2=${substring:7:6}
tar -xvzf $substring
rm -rf $sub2
mv dagens $sub2
done
use
#!/bin/bash
for file in dagens_*.tar.gz
do
from=${file%_*} #removes chars after _
to=${file#*_} #removes chars before _
to=${to%.t*} #removes chars after .t (.tar.gz)
tar -zxf $file --show-transformed --transform "s/$from/$to/"
done

Resources