How to extract part of compressed file from tar archive

How to extract part of compressed file from tar archive - tar

Given an archive compressed.tar containing files file1 and file2, I know I can extract a single file like so: tar xf compressed.tar file1. However, if file1 is a large file and I'm only interested in getting a hexdump of the first 8 bytes of the file, is there any way to extract just the first 8 bytes of the file instead of the whole thing?

Related

Converting AI to EPS / JPEG without "Use artboards"

I'm trying to read(preview) AI (adobe illustrator) file in my web application. my web app is on Linux machine and mainly uses Python.
I couldn't find any native python code that can preview AI file, so I continued to search for solution and found ghostscript, which gives the option to convert AI to JPG/PNG and I these format I have no problem previewing.
The issue I have is that I need the preview to include the whole document and not just the artboard, in illustrator it's possible when removing the checkbox from "use artboards" when saving, see screenshot: https://helpx.adobe.com/content/dam/help/en/illustrator/how-to/export-svg/_jcr_content/main-pars/image0/5286-export-svg-fig1.jpg
but when I try to export from ghostscript, I can't make it work...
from my understanding, it's best to try and first convert to EPS and then from that to JPG/PNG, but I failed doing that as well and the items that are outside the artboard are not showing.
on linux, these are the commands I basically tried , after installing ghostscript:
gs -dNOPAUSE -dBATCH -sDEVICE=eps2write -sOutputFile=out.eps input.ai
gs -dNOPAUSE -dBATCH -sDEVICE=jpeg -r300 -sOutputFile=out.jpeg input.ai
gs -dNOPAUSE -dBATCH -sDEVICE=pngalpha -r300 -sOutputFile=out.png input.ai
if it's not possible with ghostscript and I need imagemagick instead, I don't mind using it... I tried it for 10 minute and just got bunch of errors so I left it....
AI file for example: https://drive.google.com/open?id=1UgyLG_-nEUL5FLTtD3Dl281YVYzv0mUy
Jpeg example of the output I want: https://drive.google.com/open?id=1tLT2Uj1pp1gKRnJ8BojPZJxMFRn6LJoM
Thank you

Some updates on the topic: I've found this:
https://gist.github.com/moluapple/2059569
This is AI PGF extractor which should theoretically help to extract the additional data from the PDF. Currently, it seems quite old and written for win32, so I cannot test it at the moment, but it's at least some kind of lead.

Firstly, Adobe Illustrator native files are not technically supported by Ghostscript at all. They might work, because they are normally either PostScript or PDF files with custom bits that can be ignored for the purposes of drawing the content. But it's not a guarantee.
Secondly; no, do not multiply convert the files! That's a piece of cargo-cult mythology that's been doing the rounds for ages. There are sometimes reasons for doing so but in general this will simply magnify problems, not solve them. Really, don't do that.
You haven't quoted the errors you are getting and you haven't supplied any files to look at, so it's not really possible to tell what your problem is. I have no clue what an 'artboard' is, and a picture of the Illustrator dialog doesn't help.
Perhaps if you could supply an example file, and maybe a picture of what you expect, it might be possible to figure it out. My guess is that your '.ai' file is a PDF file, and that it has a MediaBox (which is what Ghostscript uses by default) and an ArtBox which is what you actually want to use. Or something like that. Hard to say without more information.
Edit
Well, I'm afraid the answer here is that you can't easily get what you want from that file without using Illustrator.
The file is a PDF file (if you rename input.ai to input.pdf then you can open it with a PDF reader). But Illustrator doesn't use most of the PDF file when it opens it. Instead the PDF file contains a '/PieceInfo' key, which is a key in the Page dictionary. That points to a dictionary which has a /Private key, which (finally!) points to a dictionary with a bunch of Illustrator stuff:
52 0 obj
<<
/AIMetaData 53 0 R
/AIPrivateData1 54 0 R
/AIPrivateData10 55 0 R
/AIPrivateData11 56 0 R
/AIPrivateData2 57 0 R
/AIPrivateData3 58 0 R
/AIPrivateData4 59 0 R
/AIPrivateData5 60 0 R
/AIPrivateData6 61 0 R
/AIPrivateData7 62 0 R
/AIPrivateData8 63 0 R
/AIPrivateData9 64 0 R
/ContainerVersion 11
/CreatorVersion 23
/NumBlock 11
/RoundtripStreamType 1
/RoundtripVersion 17
>>
endobj
That's the actual saved file format of the Illustrator file. You can think of the PDF file as a 'preview' wrapped around the Illustrator native file. Illustrator reads the PDF file to find its own data, then throws the PDF file away and uses the native file format stored within it instead.
The problem is that the PDF part of the file simply doesn't contain the content you want to see. That's stored in the Illustrator native data. Ghostscript just renders what's in the PDF file, it doesn't look at the Illustrator native file.
Looking at the Illustrator private data, some of it is uncompressed, but most is compressed, it doesn't say how it is compressed but applying the FlateDecode filter produces a good old-fashioned Illustrator PostScript file, one that will work with Ghostscript.
But you would have to manually parse the PDF file, extract all the compressed AIPrivateData streams, concatenate them together, apply the FlateDecode filter to decompress them, and only then send the resulting output to Ghostscript with the -dEPSCrop switch set. That will result in the output you want.
But neither Ghostscript nor ImageMagick (which generally uses Ghostscript to render PDF files) will do any of that for you, you would have to do it all yourself.

tar: streaming file size

I'm trying to know in advance the approximate size of a tar file while it's being streamed to stdout. According to the spec (http://www.gnu.org/software/tar/manual/html_node/Standard.html), the first 500 bytes are the header (ASCII-formatted), and bytes 124 to 136 specify the file size.
But, because it's streaming, those bytes always display 00000000 since I suppose filesize is calculated on-the-fly, or at the end.
tar -cf - myfolder | dd count=12 skip=124 iflag=count_bytes,skip_bytes > filesize
Always results in:
00000000000^#
I'm not using compression, so the tarball is roughly the size of the original data. Can tar somehow provide this information in the header before completion?
Thanks.

The answer is no, it cannot be calculated in advance while streaming.

How to preserve timestamps in tshark output file?

I'm using tshark to extract specific TCP streams and write that to an output pcap file using the -w option.
But, the frames in the output pcap do not have any timestamps or delta times (they're all zero while in the original pcap there are timestamps and delta times for the frames).
Is there any way to ensure that the original timestamps (from the original pcap file) are preserved in the output pcap?
I'm using TShark 1.10.5 (SVN Rev 54262 from /trunk-1.10) on MacOS.
Thanks!

the frames in the output pcap do not have any timestamps or delta times (they're all zero while in the original pcap there are timestamps and delta times for the frames).
That is what is technically known as a "bug". Please file it as a bug on the Wireshark Bugzilla; if you can attach your original pcap file for testing purposes, that would be good. (If not, please run the file command on it and show the results, just so we know what file type the input file is - it might, for example, be a pcap-ng file rather than a pcap file).

ImageMagick to verify image integrity

I'm using ImageMagick (with Wand in Python) to convert images and to get thumbnails from them. However, I noticed that I need to verify whether a file is an image or not ahead of time. Should I do this with Identify?
So I would assume checking the integrity of a file needs the whole file to be read into memory. Is it better to try and convert the file and if there was an error, then we know the file wasn't good.

seems like you answered your own question
$ ls -l *.png
-rw-r--r-- 1 jsp jsp 526254 Jul 20 12:10 image.png
-rw-r--r-- 1 jsp jsp 10000 Jul 20 12:12 image_with_error.png
$ identify image.png &> /dev/null; echo $?
0
$ identify image_with_error.png &> /dev/null; echo $?
0
$ convert image.png /dev/null &> /dev/null ; echo $?
0
$ convert image_with_error.png /dev/null &> /dev/null ; echo $?
1

if you specify the regard-warnings flag with the imagemagick identify tool
magick identify -regard-warnings myimage.jpg
it will throw an error if there are any warnings about the file. This is good for checking images, and seems to be a lot faster than using verbose.

I the case you use Python you can consider also the Pillow module.
In my experiments, I have used both the Pyhton Pillow module (PIL) and the Imagemagick wrapper Wand (for psd, xcf formats) in order to detect broken images, the original answer with code snippets is here.
Update:
I also implemented this solution in my Python script here on GitHub.
I also verified that damaged files (jpg) frequently are not 'broken' images i.e, a damaged picture file sometimes remains a legit picture file, the original image is lost or altered but you are still able to load it.
End Update
I quote the full answer for completeness:
You can use Python Pillow(PIL) module, with most image formats, to check if a file is a valid and intact image file.
In the case you aim at detecting also broken images, #Nadia Alramli correctly suggests the im.verify() method, but this does not detect all the possible image defects, e.g., im.verify does not detect truncated images (that most viewers often load with a greyed area).
Pillow is able to detect these type of defects too, but you have to apply image manipulation or image decode/recode in or to trigger the check. Finally I suggest to use this code:
try:
im = Image.load(filename)
im.verify() #I perform also verify, don't know if he sees other types o defects
im.close() #reload is necessary in my case
im = Image.load(filename)
im.transpose(PIL.Image.FLIP_LEFT_RIGHT)
im.close()
except:
#manage excetions here
In case of image defects this code will raise an exception.
Please consider that im.verify is about 100 times faster than performing the image manipulation (and I think that flip is one of the cheaper transformations).
With this code you are going to verify a set of images at about 10 MBytes/sec (using single thread of a modern 2.5Ghz x86_64 CPU).
For the other formats psd,xcf,.. you can use Imagemagick wrapper Wand, the code is as follows:
im = wand.image.Image(filename=filename)
temp = im.flip;
im.close()
But, from my experiments Wand does not detect truncated images, I think it loads lacking parts as greyed area without prompting.
I red that Imagemagick has an external command identify that could make the job, but I have not found a way to invoke that function programmatically and I have not tested this route.
I suggest to always perform a preliminary check, check the filesize to not be zero (or very small), is a very cheap idea:
statfile = os.stat(filename)
filesize = statfile.st_size
if filesize == 0:
#manage here the 'faulty image' case

Here's another solution using identify, but without convert:
identify -verbose *.png 2>&1 | grep "corrupt image"
identify: corrupt image 'image_with_error.png' # error/png.c/ReadPNGImage/4051.

i use identify:
$ identify image.tif
00000005.tif TIFF 4741x6981 4741x6981+0+0 8-bit DirectClass 4.471MB 0.000u 0:00.010
$ echo $?

Using Delphi or FFMpeg to create a movie from image sequence

My Delphi app has created a squence called frame_001.png to frame_100.png.
I need that to be compiled into a movie clip. I think perhaps the easiest is to call ffmpeg from the command line, according to their documentation:
For creating a video from many images:
ffmpeg -f image2 -i foo-%03d.jpeg -r 12 -s WxH foo.avi
The syntax foo-%03d.jpeg specifies to use a decimal number composed of three digits padded with zeroes to express the sequence number. It is the same syntax supported by the C printf function, but only formats accepting a normal integer are suitable.
From: http://ffmpeg.org/ffmpeg-doc.html#SEC5
However my files are (lossless) png format, so I have to convert using imagemagick first.
My command line is now:
ffmpeg.exe -f image2 -i c:\temp\wentelreader\frame_%05d.jpg -r 12 foo.avi
But then I get the error:
[image2 # 0x133a7d0]Could not find codec parameters (Video: mjpeg)
c:\temp\wentelreader\Frame_C:\VID2EVA\Tools\Mencoder\wentel.bat5d.jpg: could not
find codec parameters
What am I doing wrong?
Alternatively can this be done easily with Delphi?

Not sure if you are interested but there are delphi headers for this # http://ultrastardx.svn.sourceforge.net/viewvc/ultrastardx/trunk/src/lib/ffmpeg/
So you can use the DLL vs command line.
-Brad

Look at the file name in the error message. That can't possibly be right. The percent sign needs to get all the way to the program you're running, but it's being expanded by the batch file instead, where %0 expands to the full name and path of the file. Double the percent sign in the batch file:
ffmpeg.exe -f image2 -i c:\temp\wentelreader\frame_%%05d.jpg -r 12 foo.avi
Also, why do you want five digits when you've already said your files are named like frame_001.png, which has only three digits?

ffmpeg can create a movie from png images, why do you think you have to convert them to jpeg?

Guys in DelphiFFMpeg have been produced a component wrapper for FFMpeg. It's very expensive but it's worth to test it. However what you want to do is very simple and command-line is more than enough for you.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart