Convert HTML to TIFF or printable poster - ruby-on-rails

I want to make a website that people come and type a sentence and I make a poster out of it, print it, and send it to them.
I know I can make a box with html divs and color it and put some web font, but My questions are:
How do I go from HTML to TIFF ( I've read TIFF is the best format for poster print)
given that dpi on web is a lot lower than posters, how do I increase the dpi on generating the poster? Can I use some sort of library on the server?
What are the drawbacks of using web fonts? if they have large enough font size?
Also how do companies like zazzle, mixbook, shutterfly go about putting font on the image and printing it large?
My Original plan was to use Rails, are there any useful Gems that can help me?
I see Convert Html to a Printable Image people advice converting to PDF, wouldn't it destroy the quality of the poster?
Any other advice would be appreciated.

I think it will be useful for you https://github.com/csquared/IMGKit
And look at http://www.imagemagick.org/- for image processing. It has ruby wrapper http://rmagick.rubyforge.org/

Related

How to remove spot color (s) from an image

Is there a command line tool to remove all spot color channels from a vector input image (type can be ai, eps) and keep only the CMYK or RGB color channels .
What I ve been able to come up with so far is using ghostscript tiffsep device and then recombine the color channel images to one image using imagemagicks -combine option. The drawback of this method is that it is quite compicated and I end up with a tiff image, instead of the original (vector) format.
'Image' has a defined meaning in PostScript, it means a bitmap, a raster. I think, from the context, that you mean something more general.
The simple answer is no, in general you can't do this, and I don't know of any tool which will.
The reason is that to do so would lose information; the marks defined in Separation or DeviceN space would be lost entirely, and its generally regarded as a Bad Idea to discard random parts of the document.
Perhaps you could explain what you are trying to achieve with this (ie why are you doing this), and it might be possible to suggest an alternative method.
If you are a competent C programmer you could produce a Ghostscript subclass device using the existing FILTER device (in gdevflt.c) as a template. That device looks at the type of operation, and either passes it on to the output device, or throws it away. It would be reasonably simple to look at the current colour space and discard Separation or DeviceN space. If you then uses the pdfwrite/ps2write/eps2write outptu device you'd get an EPS, PostScript program or PDF file as the output.
Whether you go down this route, continue with what you have, or find an alternative approach, there are a couple of things you need to think about; how do you plan to tackle Separation inks with process colour names ? Eg /Separation /Black. What about DeviceN spaces where some of the inks are process colours ? Eg a duotone Black and Pantone ink. Should these be preserved or dicarded ?
Your current approach will use the parts of the object which mark process plates, but not those which mark spot colorus, which could give some very peculiar results.
[EDIT]
PDF, PostScript and EPS don't have 'layers' (PDF has a feature, Optional Content, which uses the term 'layers' as a description in the specification but that's all).
An application such as Photoshop and Illustrator can have layers, but in general what they export to has to have those 'layers' converted into something else. That 'something else' depends on what you are saving it as.
Part of the problem is that you are apparently trying to deal with 3 different kinds of input, you say Illustrator (PDF, more or less), Photoshop (raster image) and EPS (PostScript). There is little common ground between the 3, is there a reason to support all of them ?
If you are content to stick with just Illustrator you might be able to do something with Optional Content. I'm not terribly familiar with modern versions of Illustrator, but wouldn't it be simpler to save two versions of the file, one with the answer layer and one without ?
Anyway, Ghostscript can honour Optional Content, so if you can save a PDF file (not PostScript or EPS) from Illustrator, it may be that the layers will persist into the PDF as Optional Content. I suspect they will going by a quick Google. In that case you might be able to run the file through Ghostscript, telling it not to honour the Optional Content portion, and get a PDF file without it present.
Another solution (again limited to PDF) would be to open the PDF file with an editing application such as Acrobat Pro, and simply delete the bits you don't want. Deletion of that kind is relatively reliable.
It still feels like rather a long-winded way to get a PDF file with some of the content removed though. I can't help feeling that just saving two versions from the creating application would be easier.

Converting an image to Doc

I am trying to make an application which make a editable document file(doc or pdf) from an image. I am planning to use tesseract for extraction of the text. But i am not yet sure how to get the basic formatting of the text(size,bold,italic,underline) & images that might be present in the document image. I am planning to use J2EE, to make a Web Based App(Have to use J2EE). I think i might be able to recognize the components and formatting of the document using OpenCV, but i am not really sure.
Given that you are planning to use Tesseract for the basic OCR capabilities, try looking into the hORC formatted output. That includes quite a lot of additional information about font-size, font-face, position, etc.
You can find a description of hOCR here:
https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview#heading=h.e903b9bca924
If that doesn't work out, it depends on how much effort you want to put into Tesseract. It's internal APIs (available in Java via Tess4J, among others) do provide much of the information that you would need to reconstruct the page layout.

Derive Text, Images, and LaTeX Equations from Websites

Would it be possible to derive the text, images, and LaTeX equations from a particular website so that you can directly customize your own PDF without having the objects blurry? Only the image will have a fixed resolution.
I realize that there are a couple ways of generating a PDF indirectly. Attempting to render a PDF from Wolfram MathWorld on the Riemann Zeta Function, for instance, would be possible by printing and saving it as a PDF via Chrome, but as you zoom in more closely, the LaTeX equations and text naturally become blurry. I tried downloading "Wolfram's CDF Player," but it contains only the syntax for Mathematica's libraries - not the helpful explanations that the Wolfram MathWorld provides. What would be required for me to extract the text, images, and LaTeX equations in a PDF file wihtout having them blurry?
Unless you have access to the LaTeX source that was used to produce the images in a way that isn't apparent from your question, the answer is "you cannot." Casual inspection of the website linked implies that the LaTeX that is used to produce the equations is not readily available (it's probably on a backend system somewhere that produces the images that get put on the web server).
To a browser, it's just an image. The method by which the image was produced is irrelevant to how it appears on the web page, and how it would appear in a PDF (ie. more pixelated than desired).
Note that if a website uses a vector-graphics format like SVG instead of a pixel based format like PNG or JPEG, then those will translate to PDF cleanly, and will zoom nicely. That's a choice that would be made by the webmaster of the site in question.
Inspecting the source reveals that the gifs depicting each equation have alt-text that approximates the LaTeX that would render them (it might be Mathematica code--I'm not familiar with Wolfram's tools). Extracting a reasonable source wouldn't be impossible, but it would be hard. The site is laid out with tables, so even with something like beautiful soup parsing the HTML could be tricky. Some equations are broken up into different gifs, so parsing them would be even trickier. You'd also have to convert from whatever the alt-text is to LaTeX.
All in all, if you don't need to do a zillion pages, I'd suggest copy-pasting the text, saving the images, grabbing the alt-text of each image and doing the converting yourself.
For the given example, you could download the Mathematica notebook for that page. Maybe it is possible to parse something from that.

Optimization methods for iOS pdf render slowness

I have a pdf reader app which render the pdf file. It works fine for normal pdf files. But for some of big magazine files, it's really slow to render a page. Then I tried to upload my pdf file to GoodReader, it's slightly better than my app, but it's also very slow. That means this kind of pdf really need to be optimized before it's used for iOS device.
I've tried the Adobe Acrobat 10 to reduce the file size, but the the result is not very obvious. And I have another similar magazine pdf is rendered pretty fast in my reader. But I can't tell the difference. I think there must be some key factors will affect the pdf rendering. But unfortunately I have no idea at all.
Can anybody advise how to optimize pdf file? Are there any good software for that? Thanks
If you have control over the generation of your files, I would suggest to avoid complex compression algorithms such as JBIG2 and to reduce the resolution (not the compression quality) of your raster images. JBIG2 is only used in black and white images, so maybe this is why you are getting a slow performance with some files and not with others.
Text should not be a problem in general, they are usually straight forward for rendering, but maybe you can try avoiding full embedded fonts if possible to keep the file size small.
If you will be using these files in a web scenario, I would also recommend using Linearized PDF files.

Convert Doc,Docx to TIFF with delphi

Hi
How can i convert doc,docx to TIFF whith delphi?
In short, you can't.
Doc and TIFF are two completely different things. It's not like converting from BMP to TIFF (two image formats), or WAV to MP3 (two audio formats). For very limited Word documents, I suppose you could run Word through OLE automation (or maybe even embed Word in your application for better control), then take a screenshot, but I think your problems runs deeper than that. Maybe you could provide some more info about what you try to achieve?
I've done it from within Word, however the code is long lost I'm sorry.
I created an Office plugin using the Add-in Express Component.
I used Word automation to convert the current document to RTF, used WP-Tools to render, which gave me the bitmap for each page. Finally I used GDI+ to create the multi-page TIFF.
The standard trick is like with word to pdf: find a virtual printer that outputs tiffs, and instrument word over OLE to print to the virtual printer.
If I put "tiff printer virtual" in google, I see quite some hits. (not all free though, and of course it complicates installation to use two programs (word+printer) to do this)
Word is not able to save its documents to TIFF format. Your best options are to use third party software which can do that. Just google for Doc to Tiff.
When looking for tools to do this, you should also be aware that not all TIFF files are faxable. TIFF files can contain a whole range of image formats and sizes. You need to find a tool which can convert your document to monochrome bitmaps 1728 pixels wide, with the page images each in a single strip and with a compression method supported by your fax software.
A good fax software usually comes with a fax printer driver, check with the maker of your fax software if they have one. With a driver you can simply use OLE Automation to make Word print the document to this driver. The fax software we use expects the fax number and other parameters embedded in the text like this: ##NUMBER12345678##

Resources