How to programmatically manipulate an EPS file - eps

I am looking for libraries that would help in programatically manipulating EPS (Encapsulated PostScript) files. Basically, what I want to do is following:
Show / Hide preexisting layers in the EPS file (toggle them on and off)
Fill (color) named shapes in the EPS file
Retrieve coordinates of named points in the EPS file
draw shapes on a new layer in the EPS file
on a server, without user interaction (scripting Adobe Illustrator won't work)
I am aware of how the EPS file format is based on the PostScript language and must therefore be interpreted - for creating simple drawings from scratch this is rather easy. But for actually modifying existing files, I guess you need a library that interprets the file and provides some kind of "DOM" for manipulation.
Can I even have named shapes and points inside an EPS file?
EDIT: Assuming I had the layers saved in separate EPS files. Or better still: Just the "data" part of the layers. Could I then concatenate this stuff to create a new EPS file? And append drawing commands? Fill existing named objects?

This is extremely difficult and here is why: a PS file is a program whose execution results in pixels put on a page. Instruction in a PS program are at the level of "draw a line using the current pen and color" or "rotate the coordinate system by 90 degrees" but there is no notion of layers or complex objects like you would see them in a vector drawing application.
There are very few conventions in the structure of PS files to allow external programs to modify them: pages are marked separately, font resources, and media dimensions are spelled out in special comments. This is especially true for Embedded Postscript (EPS) which must follow these guidelines because they are meant to be read by applications but not for general PS as it is sent to a printer. A PS program is a much lower level of abstraction than what you need and there is now way to reconstruct it for arbitrary PS code. In principle could a PS file result in different output every time it is printed because it may query its execution environment and branch based on random decisions.
Applications like Adobe Illustrator emit PS code that follow a rigid structure. There is a chance that these could be parsed and manipulated without interpreting the code. I would stil suggest to rethink the current architecture: you are at a too low level of abstraction for what you need.

PDF is not manipulable since it is not possible to change any existing parts of a pdf (in general) only add stuff. EPS is the same as PostScript except that it has a boundary header.
Problem with doing what you want is that PS is a programming language whose output (mostly) is some kind of image. So the question could be stated as "how can I draw shapes on a new layer in the Java file". You probably need to generate the complete PS on the fly, or use another image format altogether.

I am not aware of any available libraries for this but you may be able to build something to meet your needs based on epstool from Ghostscript/GSview

I think your best bet is to generate a PDF from the EPS and then manipulate the PDF. Then back to EPS. PDF is much more "manipulable" than is EPS.

Related

How to remove spot color (s) from an image

Is there a command line tool to remove all spot color channels from a vector input image (type can be ai, eps) and keep only the CMYK or RGB color channels .
What I ve been able to come up with so far is using ghostscript tiffsep device and then recombine the color channel images to one image using imagemagicks -combine option. The drawback of this method is that it is quite compicated and I end up with a tiff image, instead of the original (vector) format.
'Image' has a defined meaning in PostScript, it means a bitmap, a raster. I think, from the context, that you mean something more general.
The simple answer is no, in general you can't do this, and I don't know of any tool which will.
The reason is that to do so would lose information; the marks defined in Separation or DeviceN space would be lost entirely, and its generally regarded as a Bad Idea to discard random parts of the document.
Perhaps you could explain what you are trying to achieve with this (ie why are you doing this), and it might be possible to suggest an alternative method.
If you are a competent C programmer you could produce a Ghostscript subclass device using the existing FILTER device (in gdevflt.c) as a template. That device looks at the type of operation, and either passes it on to the output device, or throws it away. It would be reasonably simple to look at the current colour space and discard Separation or DeviceN space. If you then uses the pdfwrite/ps2write/eps2write outptu device you'd get an EPS, PostScript program or PDF file as the output.
Whether you go down this route, continue with what you have, or find an alternative approach, there are a couple of things you need to think about; how do you plan to tackle Separation inks with process colour names ? Eg /Separation /Black. What about DeviceN spaces where some of the inks are process colours ? Eg a duotone Black and Pantone ink. Should these be preserved or dicarded ?
Your current approach will use the parts of the object which mark process plates, but not those which mark spot colorus, which could give some very peculiar results.
[EDIT]
PDF, PostScript and EPS don't have 'layers' (PDF has a feature, Optional Content, which uses the term 'layers' as a description in the specification but that's all).
An application such as Photoshop and Illustrator can have layers, but in general what they export to has to have those 'layers' converted into something else. That 'something else' depends on what you are saving it as.
Part of the problem is that you are apparently trying to deal with 3 different kinds of input, you say Illustrator (PDF, more or less), Photoshop (raster image) and EPS (PostScript). There is little common ground between the 3, is there a reason to support all of them ?
If you are content to stick with just Illustrator you might be able to do something with Optional Content. I'm not terribly familiar with modern versions of Illustrator, but wouldn't it be simpler to save two versions of the file, one with the answer layer and one without ?
Anyway, Ghostscript can honour Optional Content, so if you can save a PDF file (not PostScript or EPS) from Illustrator, it may be that the layers will persist into the PDF as Optional Content. I suspect they will going by a quick Google. In that case you might be able to run the file through Ghostscript, telling it not to honour the Optional Content portion, and get a PDF file without it present.
Another solution (again limited to PDF) would be to open the PDF file with an editing application such as Acrobat Pro, and simply delete the bits you don't want. Deletion of that kind is relatively reliable.
It still feels like rather a long-winded way to get a PDF file with some of the content removed though. I can't help feeling that just saving two versions from the creating application would be easier.

Import/export or store/restore xShapes in LibreOffice/OpenOffice Draw via API

I want – as the title says – extract programmatically a shape from a Draw document through the api interface. Beside I want to import such a shape into a document as well.
I saw some predefined shapes in XML form and the document is stored as XML structure as well. Is there a known way to anybody out there to allow the storage and load of one shape?
What is this good for?
I want to, for example, enable the programmatic deletion of objects. But to enable the undo/redo functionalities I need to “store” the deleted shape. Beyond that this would allow me to add user-defined objects programmatically, e.g. arrow heads, UML structures or unicorns.
Thanks in advance for any ideas,
J
P.S.: I work with LibreOffice Version: 5.2.1.2 . Access the interface through C# (so java and C++ would do it as well) but any ideas are welcome.
I'm not entirely sure what you are trying to do, but here are some ideas:
Instead of deleting an XShape, you could use the dispatcher to Cut it. That will store it in the clipboard, so if it needs to be added back then the dispatcher can Paste it, as long as no other copy or cut was performed.
To create a shape, see the example at https://wiki.openoffice.org/wiki/Documentation/DevGuide/Drawings/Shapes. This code will look different depending on what kind of shape it is. It sounds like you are asking for one code listing that will programmatically create any type of shape, but I do not think it is that easy.
Instead of using the UNO API, you could programmatically modify the XML files, which may make it easier to store and work with any shape. Be sure to use an XML parsing library, not just regular expressions.

can Mathematica be instructed to print-to-file smaller pdf files?

Mathematica 9.0.1.0, Linux.
Create a notebook cell with only the word "Section" and apply the format "Section" to it. Then create a variable x and evaluate it. Then print the two-cell notebook to a pdf file. (We often have to pass these forth and back via email to mobile users.) The resulting pdf file is just under 1MB big. A few more modest additions, and Mma print-to-file yields a 2-3MB files from about one page of notebook. for comparison, my 800 page dense latex-generated book with R graphics consumes about 4MB.
can Mma be instructed to produce more compact pdf files? I know it can rasterize graphics, but this isn't really a graphics problem.
this comes from the folks from wolfram support:
The pdf files are large because they contain embedded fonts for faithful reproduction.
One way to reduce the file size would be to set certain options below to False.
This can be done from Mathematica menu by going to Format->Options Inspector->
Select 'Global Preferences' from Show option values-> go to
Notebook Options->Printing Options-> EmbedExternalFonts set to False.
Do the same for Notebook Options->Printing Options->EmbedStandardPostScriptFonts
set to False.
However, the PDF that is generated may not look exactly like you want it, especially if you send it to someone else. However, if you just want to keep the PDF on your machine, where the fonts exist anyway, this may be a good default option.
apparently, their developers are working on the problem, too.

Derive Text, Images, and LaTeX Equations from Websites

Would it be possible to derive the text, images, and LaTeX equations from a particular website so that you can directly customize your own PDF without having the objects blurry? Only the image will have a fixed resolution.
I realize that there are a couple ways of generating a PDF indirectly. Attempting to render a PDF from Wolfram MathWorld on the Riemann Zeta Function, for instance, would be possible by printing and saving it as a PDF via Chrome, but as you zoom in more closely, the LaTeX equations and text naturally become blurry. I tried downloading "Wolfram's CDF Player," but it contains only the syntax for Mathematica's libraries - not the helpful explanations that the Wolfram MathWorld provides. What would be required for me to extract the text, images, and LaTeX equations in a PDF file wihtout having them blurry?
Unless you have access to the LaTeX source that was used to produce the images in a way that isn't apparent from your question, the answer is "you cannot." Casual inspection of the website linked implies that the LaTeX that is used to produce the equations is not readily available (it's probably on a backend system somewhere that produces the images that get put on the web server).
To a browser, it's just an image. The method by which the image was produced is irrelevant to how it appears on the web page, and how it would appear in a PDF (ie. more pixelated than desired).
Note that if a website uses a vector-graphics format like SVG instead of a pixel based format like PNG or JPEG, then those will translate to PDF cleanly, and will zoom nicely. That's a choice that would be made by the webmaster of the site in question.
Inspecting the source reveals that the gifs depicting each equation have alt-text that approximates the LaTeX that would render them (it might be Mathematica code--I'm not familiar with Wolfram's tools). Extracting a reasonable source wouldn't be impossible, but it would be hard. The site is laid out with tables, so even with something like beautiful soup parsing the HTML could be tricky. Some equations are broken up into different gifs, so parsing them would be even trickier. You'd also have to convert from whatever the alt-text is to LaTeX.
All in all, if you don't need to do a zillion pages, I'd suggest copy-pasting the text, saving the images, grabbing the alt-text of each image and doing the converting yourself.
For the given example, you could download the Mathematica notebook for that page. Maybe it is possible to parse something from that.

How to parse EPS to get a mesh-kind data?

My goal is to import EPS file to the app(language is C++) to create a 3D object. I am looking for some library/tool which will help me to parse EPS to a list of primitives(circles, lines, paths, etc. like in SVG) or even contours array. I've already tried the way to convert EPS to SVG first using pstoedit and uniconverter tools. But sometimes both tools make wrong conversation with data loosing. So I can not say that this way is acceptable.
Does anyone have experience in this area or have any suggestions?
This is a big project. For starters you will need a PostScript interpreter, there is no alternative to this, since the EPS can contain very nearly anything.
Rather than writing your own interpreter I would suggest you use an existing one, in fact I would suggest you use Ghostscript as it is the only GPL PS interpreter I know of.
You can write a Ghostscript containing methods which will be executed whenever the relevant operation is interpreted from the input. There is an existing (very limited, incomplete) SVG output device which would get you started.
You are going to have to handle a lot of different kinds of operations if you want a general purpose solution; For instance PostScript doesn't have a circle primitive, its curves are all Beziers, there are different kinds of line joins. You will need to consider what to do with images and presumably text (possibly discard these) and shading patterns. You will have to at least understand the various colour spaces which can be used, even if you don't plan on utilising them yourself.
Given that PostScript is inherently 2D I don't really see how you are going to construct a 3D object, but that's a different problem.

Resources