How to parse EPS to get a mesh-kind data? - parsing

My goal is to import EPS file to the app(language is C++) to create a 3D object. I am looking for some library/tool which will help me to parse EPS to a list of primitives(circles, lines, paths, etc. like in SVG) or even contours array. I've already tried the way to convert EPS to SVG first using pstoedit and uniconverter tools. But sometimes both tools make wrong conversation with data loosing. So I can not say that this way is acceptable.
Does anyone have experience in this area or have any suggestions?

This is a big project. For starters you will need a PostScript interpreter, there is no alternative to this, since the EPS can contain very nearly anything.
Rather than writing your own interpreter I would suggest you use an existing one, in fact I would suggest you use Ghostscript as it is the only GPL PS interpreter I know of.
You can write a Ghostscript containing methods which will be executed whenever the relevant operation is interpreted from the input. There is an existing (very limited, incomplete) SVG output device which would get you started.
You are going to have to handle a lot of different kinds of operations if you want a general purpose solution; For instance PostScript doesn't have a circle primitive, its curves are all Beziers, there are different kinds of line joins. You will need to consider what to do with images and presumably text (possibly discard these) and shading patterns. You will have to at least understand the various colour spaces which can be used, even if you don't plan on utilising them yourself.
Given that PostScript is inherently 2D I don't really see how you are going to construct a 3D object, but that's a different problem.

Related

How to remove spot color (s) from an image

Is there a command line tool to remove all spot color channels from a vector input image (type can be ai, eps) and keep only the CMYK or RGB color channels .
What I ve been able to come up with so far is using ghostscript tiffsep device and then recombine the color channel images to one image using imagemagicks -combine option. The drawback of this method is that it is quite compicated and I end up with a tiff image, instead of the original (vector) format.
'Image' has a defined meaning in PostScript, it means a bitmap, a raster. I think, from the context, that you mean something more general.
The simple answer is no, in general you can't do this, and I don't know of any tool which will.
The reason is that to do so would lose information; the marks defined in Separation or DeviceN space would be lost entirely, and its generally regarded as a Bad Idea to discard random parts of the document.
Perhaps you could explain what you are trying to achieve with this (ie why are you doing this), and it might be possible to suggest an alternative method.
If you are a competent C programmer you could produce a Ghostscript subclass device using the existing FILTER device (in gdevflt.c) as a template. That device looks at the type of operation, and either passes it on to the output device, or throws it away. It would be reasonably simple to look at the current colour space and discard Separation or DeviceN space. If you then uses the pdfwrite/ps2write/eps2write outptu device you'd get an EPS, PostScript program or PDF file as the output.
Whether you go down this route, continue with what you have, or find an alternative approach, there are a couple of things you need to think about; how do you plan to tackle Separation inks with process colour names ? Eg /Separation /Black. What about DeviceN spaces where some of the inks are process colours ? Eg a duotone Black and Pantone ink. Should these be preserved or dicarded ?
Your current approach will use the parts of the object which mark process plates, but not those which mark spot colorus, which could give some very peculiar results.
[EDIT]
PDF, PostScript and EPS don't have 'layers' (PDF has a feature, Optional Content, which uses the term 'layers' as a description in the specification but that's all).
An application such as Photoshop and Illustrator can have layers, but in general what they export to has to have those 'layers' converted into something else. That 'something else' depends on what you are saving it as.
Part of the problem is that you are apparently trying to deal with 3 different kinds of input, you say Illustrator (PDF, more or less), Photoshop (raster image) and EPS (PostScript). There is little common ground between the 3, is there a reason to support all of them ?
If you are content to stick with just Illustrator you might be able to do something with Optional Content. I'm not terribly familiar with modern versions of Illustrator, but wouldn't it be simpler to save two versions of the file, one with the answer layer and one without ?
Anyway, Ghostscript can honour Optional Content, so if you can save a PDF file (not PostScript or EPS) from Illustrator, it may be that the layers will persist into the PDF as Optional Content. I suspect they will going by a quick Google. In that case you might be able to run the file through Ghostscript, telling it not to honour the Optional Content portion, and get a PDF file without it present.
Another solution (again limited to PDF) would be to open the PDF file with an editing application such as Acrobat Pro, and simply delete the bits you don't want. Deletion of that kind is relatively reliable.
It still feels like rather a long-winded way to get a PDF file with some of the content removed though. I can't help feeling that just saving two versions from the creating application would be easier.

LaTeX for Chemistry on my website

I am programming a website on the subject of chemistry and for obvious reasons I also have to include structural and molecular formulas on that site. I want to have as few images as possible on the side and would therefore like to know how I can compile LaTeX code on my website, so I can show everything I could do in LaTeX itself.
Thanks in advance.
As outlined in a previous comment, Chemistry.SE has enabled mhchem in MathJax to allow the rendering of simple formula and reaction equations. The MathJax documention actually gives some directions.
As far as structures of organic molecules are concerned, I'm usually draw them using BkChem and export them as the png images.
If I understand you correctly, you would like to avoid the images themselves and not just the act of drawing. Therefore, the idea to generate the drawings from a linear representation (InChi, SMILES) using openbabel will probably not convince you.
As a matter of fact, it is possible to create structure in LaTeX using chemfig and there have been requests to support this package in MathJax. However, it seems that so far, the strong dependance of chemfig on TikZ has prevented this.

Feature extraction from Image metadata

I am working on a security problem, where I am trying to identify malicious images. I have to mine for attributes from images (most likely from the metadata) that can be fed in to Weka to run various machine learning algorithms, in order to detect malicious images.
Since the image metadata can be corrupted in various different ways, I am finding it difficult to identify the features to look at in the image metadata, which I can quantify for the learning algorithms.
I had earlier used information like pixel info etc using tools like ImageJ to help me classify images, however I am looking for a better way (with regards to the security) to identify and quantify features from the image/image-metadata.
Any suggestion on the tools and the features?
As mentioned before this is not a learning problem.
The problem is that one exploit is not *similar* to another exploit. They exploit individual, separate bugs in individual, different (!) libraries, things such as missing bounds checking. It's not so much a property of the file, but more of the library that uses it. 9 out of 10 libraries will not care. One will misbehave because of a programming error.
The best you can do to detect such files is to write the most pedantic and at the same time most robust format verifier you can come up with, and reject any image that doesn't 1000% fit the specifications. Assuming that the libraries do not have errors in processing images that are actually valid.
I strongly would recommend you start with investigating how the exploits actually work. Understanding what you are trying to "learn" may guide you to some way of detecting them in general (or understanding why there is no general detection possible ...).
Here is a simple example of the ideas of how one or two of these exploits might work:
Assume we have a very simple file format, like BMP. For compression, it has support for a simple run length encoding, so that identical pixels can be efficiently stored as (count x color pairs). Does not work well with photos, but is quite compact for line art. Consider the following image data:
Width: 4
Height: 4
Colors: 1 = red, 2 = blue, 3 = green, 4 = black
Pixel data: 2x1 (red), 4x2 (blue), 2x3, 5x1, 1x0, 4x1
How many errors in the file do you spot? They may cause some trusting library code to fail, but any modern library (written with knowing about this kind of attacks and with knowing that files may have been corrupted due to transmission and hard disk errors) should just skip that and maybe even produce a partial image. See, maybe it was not an attack, but just a programming error in the program that produced the image...
Heck, even not every out-of-bounds use must be an attack. Think of CDs. Everybody used "overburning" at some time to put more data on a CD than was meant by the specifications. Yes, some drive might crash because you overburned a CD. But I wouldn't consider all the CDs with more than 650 MB to be attacks, just because they broke the Yellow Book specifications of what a CD is.

How can I process a -dynamic- videostream and find the (relative) location of a "match" in that videostream?

As the question states: how is it possible to process some dynamic videostream? By saying dynamic, i actually mean I would like to just process stuff on my screen. So the imagearray should be some sort of "continuous screenshot".
I'd like to process the video / images based on certain patterns. How would I go about this?
It would be perfect if there already was (and there probably is) existing components. I need to be able to use the location of the matches (or partial matches). A .NET component for the different requirements could also be useful I guess...
You will probably need to read up on Computer Visual before you attempt this. There is nothing really special about video that seperates it from still imgaes. The process you might want to look at is:
Acquire the data
Split the data into individual frames
Remove noise (Use a Gaussian filter)
Segment the image into the sections you want
Remove the connected components of the image
Find a way to quantize the image for comparison
Store/match the components to a database of previously found components
With this database/datastore you'll have information on matches later in the database. Do what you like with it.
As far as software goes:
Most of these algorithms are not too difficult. You can write them yourself. They do take a bit of work though.
OpenCV does a lot of the basic stuff, but it won't do everything for you
Java: JAI, JHLabs [for filters], Various other 3rd party libraries
C#: AForge.net

How to programmatically manipulate an EPS file

I am looking for libraries that would help in programatically manipulating EPS (Encapsulated PostScript) files. Basically, what I want to do is following:
Show / Hide preexisting layers in the EPS file (toggle them on and off)
Fill (color) named shapes in the EPS file
Retrieve coordinates of named points in the EPS file
draw shapes on a new layer in the EPS file
on a server, without user interaction (scripting Adobe Illustrator won't work)
I am aware of how the EPS file format is based on the PostScript language and must therefore be interpreted - for creating simple drawings from scratch this is rather easy. But for actually modifying existing files, I guess you need a library that interprets the file and provides some kind of "DOM" for manipulation.
Can I even have named shapes and points inside an EPS file?
EDIT: Assuming I had the layers saved in separate EPS files. Or better still: Just the "data" part of the layers. Could I then concatenate this stuff to create a new EPS file? And append drawing commands? Fill existing named objects?
This is extremely difficult and here is why: a PS file is a program whose execution results in pixels put on a page. Instruction in a PS program are at the level of "draw a line using the current pen and color" or "rotate the coordinate system by 90 degrees" but there is no notion of layers or complex objects like you would see them in a vector drawing application.
There are very few conventions in the structure of PS files to allow external programs to modify them: pages are marked separately, font resources, and media dimensions are spelled out in special comments. This is especially true for Embedded Postscript (EPS) which must follow these guidelines because they are meant to be read by applications but not for general PS as it is sent to a printer. A PS program is a much lower level of abstraction than what you need and there is now way to reconstruct it for arbitrary PS code. In principle could a PS file result in different output every time it is printed because it may query its execution environment and branch based on random decisions.
Applications like Adobe Illustrator emit PS code that follow a rigid structure. There is a chance that these could be parsed and manipulated without interpreting the code. I would stil suggest to rethink the current architecture: you are at a too low level of abstraction for what you need.
PDF is not manipulable since it is not possible to change any existing parts of a pdf (in general) only add stuff. EPS is the same as PostScript except that it has a boundary header.
Problem with doing what you want is that PS is a programming language whose output (mostly) is some kind of image. So the question could be stated as "how can I draw shapes on a new layer in the Java file". You probably need to generate the complete PS on the fly, or use another image format altogether.
I am not aware of any available libraries for this but you may be able to build something to meet your needs based on epstool from Ghostscript/GSview
I think your best bet is to generate a PDF from the EPS and then manipulate the PDF. Then back to EPS. PDF is much more "manipulable" than is EPS.

Resources