How to generate thumbnail for PDF file without downloading it fully? - ios

I have to work with external rest API which allows to browse documents library - list docs, get metadata for individual docs and download documents fully or given range.
Currently we show standard icons for all documents (PDF files on server).
We want to improve and show thumbnails.
Is there a way of extracting thumbnail of cover page from PDF without reading whole file? Something similar to EXIF maybe? Client is running on iOS.

Not sure if I fully understand your environment and your limitations.
However, if you can retrieve a 'given range' of a remote document, then it's easy to just retrieve page 1. (You can only retrieve parts of PDF documents which will successfully render if they are "web optimized" a.k.a. "linearized".)
However, nowadays most PDFs do no longer contain thumbnails which could be retrieved. Adobe software (as well as other PDF viewers) do create the page previews on the fly.
So you must retrieve the first page first.
Then Ghostscript can generate a "thumbnail" from this page. Command for Linux/Unix/MacOSX:
gs \
-o thumb.jpg \
-sDEVICE=jpeg \
-g80x120 \
-dPDFFitPage \
firstpage.pdf
Command for Windows:
gswin32c.exe ^
-o thumb.jpg ^
-sDEVICE=jpeg ^
-g80x120 ^
-dPDFFitPage ^
firstpage.pdf
For this example...
...the thumbnail filetype will be JPEG. You can change this to PNG (-sDEVICE=pngalpha, or =png256 or =png16m).
...the thumbnail size will be 80x120 pixel; change it however you need.

Related

GhostScript : Missing digital signature when printing

We had to replace our signature pad by another product because product was discontinued and pens was very hard to find. We buy Topaz GemView Tablet Display
When customer sign on pad, our custom application watch folder for signed PDF and custom application print paper copy for our backyard staff and one for the customer if they want it
Our custom application use GhostScript to send PDF to specific printer.
Everything work fine with the old signature pad and GhostScript 9.16 on Windows 2012
With the Topaz pad, the PDF print, but there are no signature.
I have updated GhostScript to the latest version, 9.53.3, no signature
Here is a link to a sample signed PDF :
https://wetransfer.com/downloads/997a149ab09640d523397248ae6b161020210127144440/e5adad1b76799726522899389fe9415620210127144513/21e3e8
Here is the command line that I use to send PDF to the printer
gswin64c.exe -dPrinted -dBATCH -dNOPAUSE -dNOSAFER -dNumCopies=1 -sPAPERSIZE=letter -sDEVICE=mswinpr2 -sOutputFile="\\spool\\\srv\ColorPaper" "Signed.Pdf"
If I remove all parameter, we can see signature on the screen
gswin64c.exe "Signed.Pdf"
Does GhostScript can print digital signature on paper ?
Thank you
Like chrisl say in it's comment, by changing parameter "-dPrinted" for "-dPrinted=false" solve this issue.
The signature field in the PDF are flagged as "not printable".
When using "-dPrinted=false", GhostScript print PDF like it appear on screen

youtube-dl -citw downloads only one video instead of all

I used
user#ubuntu:~/Arbeitsfläche$ sudo youtube-dl -citw ytuser: raz\ malca
to download all videos from this channel. But it wodnloads only one:
user#ubuntu:~/Arbeitsfläche$ sudo youtube-dl -citw ytuser: raz\ malca
[generic] ytuser:: Requesting header
ERROR: Invalid URL protocol; please report this issue on https://yt-dl.org/bug . Be sure to call youtube-dl with the --verbose flag and include its complete output. Make sure you are using the latest version; type youtube-dl -U to update.
WARNING: Falling back to youtube search for raz malca . Set --default-search to "auto" to suppress this warning.
[youtube:search] query "raz malca": Downloading page 1
[download] Downloading playlist: raz malca
[youtube:search] playlist raz malca: Collected 1 video ids (downloading 1 of them)
[download] Downloading video #1 of 1
[youtube] Setting language
[youtube] R72vpqqNSDw: Downloading webpage
[youtube] R72vpqqNSDw: Downloading video info webpage
[youtube] R72vpqqNSDw: Extracting video information
I use the newest version of this programm.
My system is ubuntu 13.10 :)
Has anyone a idea ?
First of all, don't run youtube-dl with sudo - normal user rights suffice. There is also absolutely no need to pass in -ctw. But the main problem is that your command line contains a superfluous space after ytuser:. In any case, even if the space weren't there, raz malca is not a valid YouTube user ID. There is, however, a channel named so. Simply pass in that channel's URL into youtube-dl:
youtube-dl -i https://www.youtube.com/channel/UCnmQSqOPhkawAdndZgjfanA
to find out the channel's URL, do an inspect on the user that created the channel. E.g. for the channel mix - o grande amor (Tom Jobim) - Solo Guitar by Chiba Kosei, right click (in Firefox) on Chiba Kosei and select inspect element. Notice that the channel is in the href part. So copy that and your final channel-url is https://youtube.com/channel/UC5q9SrhlCJ4YciPTLLNTR5w
a href="/channel/UC5q9SrhlCJ4YciPTLLNTR5w" class="yt-uix-sessionlink g-hovercard spf-link " data-ytid="UC5q9SrhlCJ4YciPTLLNTR5w" data-sessionlink="itct=CDIQ4TkiEwjBvtSO7v3KAhWPnr4KHUYMDbko-B0">Chiba Kosei

Downloading a YouTube video through Wget

I am trying to download YouTube videos through Wget. The first thing necessary is to capture the URL of the actual video resource. Suppose I want to download this video: video. Opening up the page in the Firebug console reveals something like this:
The link which I have encircled looks like the link to the resource, for there we see only the video: http://www.youtube.com/v/r-KBncrOggI?version=3&autohide=1. However, when I am trying to download this resource with Wget, a 4 KB file of name r-KBncrOggI#version=3&autohide=1 gets stored in my hard-drive, nothing else. What should I do to get the actual video?
And secondly, is there a way to capture different resources for videos of different resolutions, like 360px, 480px, etc.?
Here is one VERY simplified, yet functional version of the youtube-download utility I cited on my another answer:
#!/usr/bin/env perl
use strict;
use warnings;
# CPAN modules we depend on
use JSON::XS;
use LWP::UserAgent;
use URI::Escape;
# Initialize the User Agent
# YouTube servers are weird, so *don't* parse headers!
my $ua = LWP::UserAgent->new(parse_head => 0);
# fetch video page or abort
my $res = $ua->get($ARGV[0]);
die "bad HTTP response" unless $res->is_success;
# scrape video metadata
if ($res->content =~ /\byt\.playerConfig\s*=\s*({.+?});/sx) {
# parse as JSON or abort
my $json = eval { decode_json $1 };
die "bad JSON: $1" if $#;
# inside the JSON 'args' property, there's an encoded
# url_encoded_fmt_stream_map property which points
# to stream URLs and signatures
while ($json->{args}{url_encoded_fmt_stream_map} =~ /\burl=(http.+?)&sig=([0-9A-F\.]+)/gx) {
# decode URL and attach signature
my $url = uri_unescape($1) . "&signature=$2";
print $url, "\n";
}
}
Usage example (it returns several URLs to streams with different encoding/quality):
$ perl youtube.pl http://www.youtube.com/watch?v=r-KBncrOggI | head -n 1
http://r19---sn-bg07sner.c.youtube.com/videoplayback?fexp=923014%2C916623%2C920704%2C912806%2C922403%2C922405%2C929901%2C913605%2C925710%2C929104%2C929110%2C908493%2C920201%2C913302%2C919009%2C911116%2C926403%2C910221%2C901451&ms=au&mv=m&mt=1357996514&cp=U0hUTVBNUF9FUUNONF9IR1RCOk01RjRyaG4wTHdQ&id=afe2819dcace8202&ratebypass=yes&key=yt1&newshard=yes&expire=1358022107&ip=201.52.68.216&ipbits=8&upn=m-kyX9-4Tgc&sparams=cp%2Cid%2Cip%2Cipbits%2Citag%2Cratebypass%2Csource%2Cupn%2Cexpire&itag=44&sver=3&source=youtube,quality=large&signature=A1E7E91DD087067ED59101EF2AE421A3503C7FED.87CBE6AE7FB8D9E2B67FEFA9449D0FA769AEA739
I'm afraid it's not that easy do get the right link for the video resource.
The link you got, http://www.youtube.com/v/r-KBncrOggI?version=3&autohide=1, points to the player rather than the video itself. There is one Perl utility, youtube-download, which is well-maintained and does the trick. This is how to get the HQ version (magic fmt=18) of that video:
stas#Stanislaws-MacBook-Pro:~$ youtube-download -o "{title}.{suffix}" --fmt 18 r-KBncrOggI
--> Working on r-KBncrOggI
Downloading `Sourav Ganguly in Farhan Akhtar's Show - Oye! It's Friday!.mp4`
75161060/75161060 (100.00%)
Download successful!
stas#Stanislaws-MacBook-Pro:~$
There might be better command-line YouTube Downloaders around. But sorry, one doesn't simply download a video using Firebug and wget any more :(
The only way I know to capture that URL manually is by watching the active downloads of the browser:
That largest data chunks are video data, so you can copy its URL:
http://s.youtube.com/s?lact=111116&uga=m30&volume=4.513679238953965&sd=BBE62AA4AHH1357937949850490&rendering=accelerated&fs=0&decoding=software&nsivbblmax=679542.000&hcbt=105.345&sendtmp=1&fmt=35&w=640&vtmp=1&referrer=None&hl=en_US&nsivbblmin=486355.000&nsivbblmean=603805.166&md=1&plid=AATTCZEEeM825vCx&ns=yt&ptk=youtube_none&csipt=watch7&rt=110.904&tsphab=1&nsiabblmax=129097.000&tspne=0&tpmt=110&nsiabblmin=123113.000&tspfdt=436&hbd=30900552&et=110.146&hbt=30.770&st=70.213&cfps=25&cr=BR&h=480&screenw=1440&nsiabblmean=125949.872&cpn=JlqV9j_oE1jzk7Zc&nsivbblc=343&nsiabblc=343&docid=r-KBncrOggI&len=1302.676&screenh=900&abd=1&pixel_ratio=1&bc=26131333&playerw=854&idpj=0&hcbd=25408143&playerh=510&ldpj=0&fexp=920704,919009,922403,916709,912806,929110,928008,920201,901451,909708,913605,925710,916623,929104,913302,910221,911116,914093,922405,929901&scoville=1&el=detailpage&bd=6676317&nsidf=1&vid=Yfg8gnutZoTD4G5SVKCxpsPvirbqG7pvR&bt=40.333&mos=0&vq=auto
However, for a large video, this will only return a part of the stream unless you figure out the URL query parameter responsible for stream range to be downloaded and adjust it.
A bonus: everything changes periodically as YouTube is constantly evolving. So, don't do that manually unless you carve pain.

Google PageSpeed & ImageMagick JPG compression

Given a user uploaded image, I need to create various thumbnails of it for display on a website. I'm using ImageMagick and trying to make Google PageSpeed happy. Unfortunately, no matter what quality value I specify in the convert command, PageSpeed is still able to suggest compressing the image even further.
Note that http://www.imagemagick.org/script/command-line-options.php?ImageMagick=2khj9jcl1gd12mmiu4lbo9p365#quality mentions:
For the JPEG ... image formats,
quality is 1 [provides the] lowest
image quality and highest compression
....
I actually even tested compressing the image using 1 (it produced an unusable image, though) and PageSpeed still suggests that I can still optimize such image by "losslessly compressing" the image. I don't know how to compress an image any more using ImageMagick. Any suggestions?
Here's a quick way to test what I am talking about:
assert_options(ASSERT_BAIL, TRUE);
// TODO: specify valid image here
$input_filename = 'Dock.jpg';
assert(file_exists($input_filename));
$qualities = array('100', '75', '50', '25', '1');
$geometries = array('100x100', '250x250', '400x400');
foreach($qualities as $quality)
{
echo("<h1>$quality</h1>");
foreach ($geometries as $geometry)
{
$output_filename = "$geometry-$quality.jpg";
$command = "convert -units PixelsPerInch -density 72x72 -quality $quality -resize $geometry $input_filename $output_filename";
$output = array();
$return = 0;
exec($command, $output, $return);
echo('<img src="' . $output_filename . '" />');
assert(file_exists($output_filename));
assert($output === array());
assert($return === 0);
}
echo ('<br/>');
}
The JPEG may contain comments, thumbnails or metadata, which can be removed.
Sometimes it is possible to compress JPEG files more, while keeping the same quality. This is possible if the program which generated the image did not use the optimal algorithm or parameters to compress the image. By recompressing the same data, an optimizer may reduce the image size. This works by using specific Huffman tables for compression.
You may run jpegtran or jpegoptim on your created file, to reduce it further in size.
To minimize the image sizes even more, you should remove all meta data. ImageMagick can do this by adding a -strip to the commandline.
Have you also considered to put your thumbnail images as inline-d base64 encoded data into your HTML?
This can make your web page load much faster (even though the size gets a bit larger), because it saves the browser from running multiple requests for all the image files (the images) which are referenced in the HTML code.
Your HTML code for such an image would look like this:
<IMG SRC="data:image/png;base64,
iVBORw0KGgoAAAANSUhEUgAAAM4AAABJAQMAAABPZIvnAAAABGdBTUEAALGPC/xh
BQAAAAFzUkdCAK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAA
OpgAABdwnLpRPAAAAAZQTFRFAAAA/wAAG/+NIgAAAAF0Uk5TAEDm2GYAAAABYktH
RACIBR1IAAAACXBIWXMAAABIAAAASABGyWs+AAAB6ElEQVQ4y+3UQY7bIBQG4IeQ
yqYaLhANV+iyi9FwpS69iGyiLuZYpepF6A1YskC8/uCA7SgZtVI3lcoiivkIxu/9
MdH/8U+N6el2pk0oFyibWyr1Q3+PlO2NqJV+/BnRPMjcJ9zrfJ/U+zQ9oAvo+QGF
d+npPqFQn++TXElkrEpEJhAtlTBR6dNHUuzIMhFnEhxAmJDkKxlmt7ATXDDJYcaE
r4Txqtkl42VYSH+t9KrD9b5nxZeog/LWGVHprGInGWVQUTvjDWXca5KdsowqyGSc
DrZRlGlQUl4kQwpUjiSS9gI9VdECZhHFQ2I+UE2CHJQfkNxTNKCl0RkURqlLowJK
1h1p3sjc0CJD39D4BIqD7JvvpH/GAxl2/YSq9mtHSHknga7OKNOHKyEdaFC2Dh1w
9VSJemBeGuHgMuh24EynK03YM1Lr83OjUle38aVSfTblT424rl4LhdglsUag5RB5
uBJSJBIiELSzaAeIN0pUlEeZEMeClC4cBuH6mxOlgPjC3uLproUCWfy58WPN/MZR
86ghc888yNdD0Tj8eAucasl2I5LqX19I7EmEjaYjSb9R/G1SYfQA7ZBuT5H6WwDt
UAfK1BOJmh/eZnKLeKvZ/vA8qonCpj1h6djfbqvW620Tva36++MXUkNDlFREMVkA
AAAldEVYdGRhdGU6Y3JlYXRlADIwMTItMDgtMjJUMDg6Mzc6NDUrMDI6MDBTUnmt
AAAAJXRFWHRkYXRlOm1vZGlmeQAyMDEyLTA4LTIyVDA4OjM3OjQ1KzAyOjAwIg/B
EQAAAA50RVh0bGFiZWwAImdvb2dsZSJdcbX4AAAAAElFTkSuQmCC"
ALT="google" WIDTH=214 HEIGHT=57 VSPACE=5 HSPACE=5 BORDER=0 />
And you would create the base64 encoded image data like this:
base64 -i image.jpg -o image.b64
Google performs those calculations based on it's WebP image format (https://developers.google.com/speed/webp/).
Despite giving performance gains though, it is currently supported only by chrome and opera (http://caniuse.com/webp)

How can I programmatically get the image on this page?

The URL http://www.fourmilab.ch/cgi-bin/Earth shows a live map of the Earth.
If I issue this URL in my browser (FF), the image shows up just fine. But when I try 'wget' to fetch the same page, I fail!
Here's what I tried first:
wget -p http://www.fourmilab.ch/cgi-bin/Earth
Thinking, that probably all other form fields are required too, I did a 'View Source' on the above page, noted down the various field values, and then issued the following URL:
wget --post-data "opt=-p&lat=7°27'&lon=50°49'&ns=North&ew=East&alt=150889769&img=learth.evif&date=1&imgsize=320&daynight=-d" http://www.fourmilab.ch/cgi-bin/Earth
Still no image!
Can someone please tell me what is going on here...? Are there any 'gotchas' with CGI and/or form-POST based wgets? Where (book or online resource) would such concepts be explained?
If you will inspect the page's source code, there's a link with img inside, that contains the image of earth. For example:
<img
src="/cgi-bin/Earth?di=570C6ABB1F33F13E95631EFF088262D5E20F2A10190A5A599229"
ismap="ismap" usemap="#zoommap" width="320" height="320" border="0" alt="" />
Without giving the 'di' parameter, you are just asking for whole web page, with references to this image, not for the image itself.
Edit: 'Di' parameter encodes which "part" of the earth you want to receive, anyway, try for example
wget http://www.fourmilab.ch/cgi-bin/Earth?di=F5AEC312B69A58973CCAB756A12BCB7C47A9BE99E3DDC5F63DF746B66C122E4E4B28ADC1EFADCC43752B45ABE2585A62E6FB304ACB6354E2796D9D3CEF7A1044FA32907855BA5C8F
Use GET instead of POST. They're completely different for the CGI program in the background.
Following on from Ravadre,
wget -p http://www.fourmilab.ch/cgi-bin/Earth
downloads an XHTML file which contain an <img> tag.
I edited the XHTML to remove everything but the img tag and turned it into a bash script containing another wget -p command, escaping the ? and =
When I executed this I got a 14kB file which I renamed earth.jpg
Not really programmatic, the way I did it, but I think it could be done.
But as #somedeveloper said, the di value is changing (since it depends on time).
Guys, here's what I finally did. Not fully happy with this solution, as I was (and am still) hoping for a better way... one that gets the image on the first wget itself... giving me the same user experience I get when browsing via firefox.
#!/bin/bash
tmpf=/tmp/delme.jpeg
base=http://www.fourmilab.ch
liveurl=$(wget -O - $base/cgi-bin/Earth?opt=-p 2>/dev/null | perl -0777 -nle 'if(m#<img \s+ src \s* = \s* "(/cgi-bin/Earth\?di= .*? )" #gsix) { print "$1\n" }' )
wget -O $tmpf $base/$liveurl &>/dev/null
What you are downloading is the whole HTML page and not the image. To download the image and other elements too, you'll need to use the --page-requisites (and possibly --convert-links) parameter(s). Unfortunately because robots.txt disallows access to URLs under /cgi-bin/, wget will not download the image which is located under /cgi-bin/. AFAIK there's no parameter to disable the robots protocol.

Resources