Mathematica 9.0.1.0, Linux.
Create a notebook cell with only the word "Section" and apply the format "Section" to it. Then create a variable x and evaluate it. Then print the two-cell notebook to a pdf file. (We often have to pass these forth and back via email to mobile users.) The resulting pdf file is just under 1MB big. A few more modest additions, and Mma print-to-file yields a 2-3MB files from about one page of notebook. for comparison, my 800 page dense latex-generated book with R graphics consumes about 4MB.
can Mma be instructed to produce more compact pdf files? I know it can rasterize graphics, but this isn't really a graphics problem.
this comes from the folks from wolfram support:
The pdf files are large because they contain embedded fonts for faithful reproduction.
One way to reduce the file size would be to set certain options below to False.
This can be done from Mathematica menu by going to Format->Options Inspector->
Select 'Global Preferences' from Show option values-> go to
Notebook Options->Printing Options-> EmbedExternalFonts set to False.
Do the same for Notebook Options->Printing Options->EmbedStandardPostScriptFonts
set to False.
However, the PDF that is generated may not look exactly like you want it, especially if you send it to someone else. However, if you just want to keep the PDF on your machine, where the fonts exist anyway, this may be a good default option.
apparently, their developers are working on the problem, too.
Related
Is there a command line tool to remove all spot color channels from a vector input image (type can be ai, eps) and keep only the CMYK or RGB color channels .
What I ve been able to come up with so far is using ghostscript tiffsep device and then recombine the color channel images to one image using imagemagicks -combine option. The drawback of this method is that it is quite compicated and I end up with a tiff image, instead of the original (vector) format.
'Image' has a defined meaning in PostScript, it means a bitmap, a raster. I think, from the context, that you mean something more general.
The simple answer is no, in general you can't do this, and I don't know of any tool which will.
The reason is that to do so would lose information; the marks defined in Separation or DeviceN space would be lost entirely, and its generally regarded as a Bad Idea to discard random parts of the document.
Perhaps you could explain what you are trying to achieve with this (ie why are you doing this), and it might be possible to suggest an alternative method.
If you are a competent C programmer you could produce a Ghostscript subclass device using the existing FILTER device (in gdevflt.c) as a template. That device looks at the type of operation, and either passes it on to the output device, or throws it away. It would be reasonably simple to look at the current colour space and discard Separation or DeviceN space. If you then uses the pdfwrite/ps2write/eps2write outptu device you'd get an EPS, PostScript program or PDF file as the output.
Whether you go down this route, continue with what you have, or find an alternative approach, there are a couple of things you need to think about; how do you plan to tackle Separation inks with process colour names ? Eg /Separation /Black. What about DeviceN spaces where some of the inks are process colours ? Eg a duotone Black and Pantone ink. Should these be preserved or dicarded ?
Your current approach will use the parts of the object which mark process plates, but not those which mark spot colorus, which could give some very peculiar results.
[EDIT]
PDF, PostScript and EPS don't have 'layers' (PDF has a feature, Optional Content, which uses the term 'layers' as a description in the specification but that's all).
An application such as Photoshop and Illustrator can have layers, but in general what they export to has to have those 'layers' converted into something else. That 'something else' depends on what you are saving it as.
Part of the problem is that you are apparently trying to deal with 3 different kinds of input, you say Illustrator (PDF, more or less), Photoshop (raster image) and EPS (PostScript). There is little common ground between the 3, is there a reason to support all of them ?
If you are content to stick with just Illustrator you might be able to do something with Optional Content. I'm not terribly familiar with modern versions of Illustrator, but wouldn't it be simpler to save two versions of the file, one with the answer layer and one without ?
Anyway, Ghostscript can honour Optional Content, so if you can save a PDF file (not PostScript or EPS) from Illustrator, it may be that the layers will persist into the PDF as Optional Content. I suspect they will going by a quick Google. In that case you might be able to run the file through Ghostscript, telling it not to honour the Optional Content portion, and get a PDF file without it present.
Another solution (again limited to PDF) would be to open the PDF file with an editing application such as Acrobat Pro, and simply delete the bits you don't want. Deletion of that kind is relatively reliable.
It still feels like rather a long-winded way to get a PDF file with some of the content removed though. I can't help feeling that just saving two versions from the creating application would be easier.
Would it be possible to derive the text, images, and LaTeX equations from a particular website so that you can directly customize your own PDF without having the objects blurry? Only the image will have a fixed resolution.
I realize that there are a couple ways of generating a PDF indirectly. Attempting to render a PDF from Wolfram MathWorld on the Riemann Zeta Function, for instance, would be possible by printing and saving it as a PDF via Chrome, but as you zoom in more closely, the LaTeX equations and text naturally become blurry. I tried downloading "Wolfram's CDF Player," but it contains only the syntax for Mathematica's libraries - not the helpful explanations that the Wolfram MathWorld provides. What would be required for me to extract the text, images, and LaTeX equations in a PDF file wihtout having them blurry?
Unless you have access to the LaTeX source that was used to produce the images in a way that isn't apparent from your question, the answer is "you cannot." Casual inspection of the website linked implies that the LaTeX that is used to produce the equations is not readily available (it's probably on a backend system somewhere that produces the images that get put on the web server).
To a browser, it's just an image. The method by which the image was produced is irrelevant to how it appears on the web page, and how it would appear in a PDF (ie. more pixelated than desired).
Note that if a website uses a vector-graphics format like SVG instead of a pixel based format like PNG or JPEG, then those will translate to PDF cleanly, and will zoom nicely. That's a choice that would be made by the webmaster of the site in question.
Inspecting the source reveals that the gifs depicting each equation have alt-text that approximates the LaTeX that would render them (it might be Mathematica code--I'm not familiar with Wolfram's tools). Extracting a reasonable source wouldn't be impossible, but it would be hard. The site is laid out with tables, so even with something like beautiful soup parsing the HTML could be tricky. Some equations are broken up into different gifs, so parsing them would be even trickier. You'd also have to convert from whatever the alt-text is to LaTeX.
All in all, if you don't need to do a zillion pages, I'd suggest copy-pasting the text, saving the images, grabbing the alt-text of each image and doing the converting yourself.
For the given example, you could download the Mathematica notebook for that page. Maybe it is possible to parse something from that.
I have an app (written in D2010) which is similar to a text retrieval app... It has a list of questions, with their corresponding answers. Most answers are strictly text, but some answers have graphics, and formatting. My dilemma has to do with the formatted answer. The user should be able to copy this answer (formatting and graphics) in order to paste it into another app. I have tried using a Word OCX. This is a little problematic. User has to have word, it gives random errors when using inside a virtual machine, etc. I am now playing with using a built in browser component, and viewing the data as a PDF. This is nice and easy, but when I copy and paste it, I loose all formatting, and the graphic shows up as a large totally black box.
I can store the data in whatever format I choose. It is stored as a BLOB in a DB file. I write it to a temp file and then I call some type of viewing routine, so I have flexibility there. My issue is really, what viewer mechanism is simple to implement, and allows copying/pasting, while maintaining text formatting (bullets, indents, etc) and graphics.
Thanks,
GS
The TRichEdit (or any of TRichEdit descendants or similar classes) will allow the users to visualize text formatting and images, and when the content is copied, the RTF representation of the data will be copied into the clipboard.
When the clipboard data is pasted into a RTF compatible text editor (like Wordpad and Word), all the formatting, bullets and images are preserved.
I have a pdf reader app which render the pdf file. It works fine for normal pdf files. But for some of big magazine files, it's really slow to render a page. Then I tried to upload my pdf file to GoodReader, it's slightly better than my app, but it's also very slow. That means this kind of pdf really need to be optimized before it's used for iOS device.
I've tried the Adobe Acrobat 10 to reduce the file size, but the the result is not very obvious. And I have another similar magazine pdf is rendered pretty fast in my reader. But I can't tell the difference. I think there must be some key factors will affect the pdf rendering. But unfortunately I have no idea at all.
Can anybody advise how to optimize pdf file? Are there any good software for that? Thanks
If you have control over the generation of your files, I would suggest to avoid complex compression algorithms such as JBIG2 and to reduce the resolution (not the compression quality) of your raster images. JBIG2 is only used in black and white images, so maybe this is why you are getting a slow performance with some files and not with others.
Text should not be a problem in general, they are usually straight forward for rendering, but maybe you can try avoiding full embedded fonts if possible to keep the file size small.
If you will be using these files in a web scenario, I would also recommend using Linearized PDF files.
I am looking for libraries that would help in programatically manipulating EPS (Encapsulated PostScript) files. Basically, what I want to do is following:
Show / Hide preexisting layers in the EPS file (toggle them on and off)
Fill (color) named shapes in the EPS file
Retrieve coordinates of named points in the EPS file
draw shapes on a new layer in the EPS file
on a server, without user interaction (scripting Adobe Illustrator won't work)
I am aware of how the EPS file format is based on the PostScript language and must therefore be interpreted - for creating simple drawings from scratch this is rather easy. But for actually modifying existing files, I guess you need a library that interprets the file and provides some kind of "DOM" for manipulation.
Can I even have named shapes and points inside an EPS file?
EDIT: Assuming I had the layers saved in separate EPS files. Or better still: Just the "data" part of the layers. Could I then concatenate this stuff to create a new EPS file? And append drawing commands? Fill existing named objects?
This is extremely difficult and here is why: a PS file is a program whose execution results in pixels put on a page. Instruction in a PS program are at the level of "draw a line using the current pen and color" or "rotate the coordinate system by 90 degrees" but there is no notion of layers or complex objects like you would see them in a vector drawing application.
There are very few conventions in the structure of PS files to allow external programs to modify them: pages are marked separately, font resources, and media dimensions are spelled out in special comments. This is especially true for Embedded Postscript (EPS) which must follow these guidelines because they are meant to be read by applications but not for general PS as it is sent to a printer. A PS program is a much lower level of abstraction than what you need and there is now way to reconstruct it for arbitrary PS code. In principle could a PS file result in different output every time it is printed because it may query its execution environment and branch based on random decisions.
Applications like Adobe Illustrator emit PS code that follow a rigid structure. There is a chance that these could be parsed and manipulated without interpreting the code. I would stil suggest to rethink the current architecture: you are at a too low level of abstraction for what you need.
PDF is not manipulable since it is not possible to change any existing parts of a pdf (in general) only add stuff. EPS is the same as PostScript except that it has a boundary header.
Problem with doing what you want is that PS is a programming language whose output (mostly) is some kind of image. So the question could be stated as "how can I draw shapes on a new layer in the Java file". You probably need to generate the complete PS on the fly, or use another image format altogether.
I am not aware of any available libraries for this but you may be able to build something to meet your needs based on epstool from Ghostscript/GSview
I think your best bet is to generate a PDF from the EPS and then manipulate the PDF. Then back to EPS. PDF is much more "manipulable" than is EPS.