I am working on a security problem, where I am trying to identify malicious images. I have to mine for attributes from images (most likely from the metadata) that can be fed in to Weka to run various machine learning algorithms, in order to detect malicious images.
Since the image metadata can be corrupted in various different ways, I am finding it difficult to identify the features to look at in the image metadata, which I can quantify for the learning algorithms.
I had earlier used information like pixel info etc using tools like ImageJ to help me classify images, however I am looking for a better way (with regards to the security) to identify and quantify features from the image/image-metadata.
Any suggestion on the tools and the features?
As mentioned before this is not a learning problem.
The problem is that one exploit is not *similar* to another exploit. They exploit individual, separate bugs in individual, different (!) libraries, things such as missing bounds checking. It's not so much a property of the file, but more of the library that uses it. 9 out of 10 libraries will not care. One will misbehave because of a programming error.
The best you can do to detect such files is to write the most pedantic and at the same time most robust format verifier you can come up with, and reject any image that doesn't 1000% fit the specifications. Assuming that the libraries do not have errors in processing images that are actually valid.
I strongly would recommend you start with investigating how the exploits actually work. Understanding what you are trying to "learn" may guide you to some way of detecting them in general (or understanding why there is no general detection possible ...).
Here is a simple example of the ideas of how one or two of these exploits might work:
Assume we have a very simple file format, like BMP. For compression, it has support for a simple run length encoding, so that identical pixels can be efficiently stored as (count x color pairs). Does not work well with photos, but is quite compact for line art. Consider the following image data:
Width: 4
Height: 4
Colors: 1 = red, 2 = blue, 3 = green, 4 = black
Pixel data: 2x1 (red), 4x2 (blue), 2x3, 5x1, 1x0, 4x1
How many errors in the file do you spot? They may cause some trusting library code to fail, but any modern library (written with knowing about this kind of attacks and with knowing that files may have been corrupted due to transmission and hard disk errors) should just skip that and maybe even produce a partial image. See, maybe it was not an attack, but just a programming error in the program that produced the image...
Heck, even not every out-of-bounds use must be an attack. Think of CDs. Everybody used "overburning" at some time to put more data on a CD than was meant by the specifications. Yes, some drive might crash because you overburned a CD. But I wouldn't consider all the CDs with more than 650 MB to be attacks, just because they broke the Yellow Book specifications of what a CD is.
Related
Is there a command line tool to remove all spot color channels from a vector input image (type can be ai, eps) and keep only the CMYK or RGB color channels .
What I ve been able to come up with so far is using ghostscript tiffsep device and then recombine the color channel images to one image using imagemagicks -combine option. The drawback of this method is that it is quite compicated and I end up with a tiff image, instead of the original (vector) format.
'Image' has a defined meaning in PostScript, it means a bitmap, a raster. I think, from the context, that you mean something more general.
The simple answer is no, in general you can't do this, and I don't know of any tool which will.
The reason is that to do so would lose information; the marks defined in Separation or DeviceN space would be lost entirely, and its generally regarded as a Bad Idea to discard random parts of the document.
Perhaps you could explain what you are trying to achieve with this (ie why are you doing this), and it might be possible to suggest an alternative method.
If you are a competent C programmer you could produce a Ghostscript subclass device using the existing FILTER device (in gdevflt.c) as a template. That device looks at the type of operation, and either passes it on to the output device, or throws it away. It would be reasonably simple to look at the current colour space and discard Separation or DeviceN space. If you then uses the pdfwrite/ps2write/eps2write outptu device you'd get an EPS, PostScript program or PDF file as the output.
Whether you go down this route, continue with what you have, or find an alternative approach, there are a couple of things you need to think about; how do you plan to tackle Separation inks with process colour names ? Eg /Separation /Black. What about DeviceN spaces where some of the inks are process colours ? Eg a duotone Black and Pantone ink. Should these be preserved or dicarded ?
Your current approach will use the parts of the object which mark process plates, but not those which mark spot colorus, which could give some very peculiar results.
[EDIT]
PDF, PostScript and EPS don't have 'layers' (PDF has a feature, Optional Content, which uses the term 'layers' as a description in the specification but that's all).
An application such as Photoshop and Illustrator can have layers, but in general what they export to has to have those 'layers' converted into something else. That 'something else' depends on what you are saving it as.
Part of the problem is that you are apparently trying to deal with 3 different kinds of input, you say Illustrator (PDF, more or less), Photoshop (raster image) and EPS (PostScript). There is little common ground between the 3, is there a reason to support all of them ?
If you are content to stick with just Illustrator you might be able to do something with Optional Content. I'm not terribly familiar with modern versions of Illustrator, but wouldn't it be simpler to save two versions of the file, one with the answer layer and one without ?
Anyway, Ghostscript can honour Optional Content, so if you can save a PDF file (not PostScript or EPS) from Illustrator, it may be that the layers will persist into the PDF as Optional Content. I suspect they will going by a quick Google. In that case you might be able to run the file through Ghostscript, telling it not to honour the Optional Content portion, and get a PDF file without it present.
Another solution (again limited to PDF) would be to open the PDF file with an editing application such as Acrobat Pro, and simply delete the bits you don't want. Deletion of that kind is relatively reliable.
It still feels like rather a long-winded way to get a PDF file with some of the content removed though. I can't help feeling that just saving two versions from the creating application would be easier.
I saw the presentation at the High-Perf Graphics "High-Performance Software Rasterization on GPUs" and I was very impressed of the work/analysis/comparison..
http://www.highperformancegraphics.org/previous/www_2011/media/Papers/HPG2011_Papers_Laine.pdf
http://research.nvidia.com/sites/default/files/publications/laine2011hpg_paper.pdf
My background was Cuda, then I started learning OpenGL two years ago to develop the 3d interface of EMM-Check, a field-of-view-analyze program to check if a vehicle is going to fulfill a specific standard or not. essentially you load a vehicle (or different parts), then you can move it completely or separately, add mirrors/cameras, analyze the point of view and shadows for the point of view of the driver, etc..
We are dealing with some transparent elements (mainly the field of views, but also vehicle themselves might be), therefore I wrote some rough algorithm to sort on fly the elements to be rendered (at primitive level, a kind of Painter's algorithm) but of course there are cases in which it easily fails, although for most of cases is enough..
For this reason I started googling, I found many techniques, like (dual) depth peeling, A/R/K/F-buffer, ecc ecc
But it looks like all of them suffer at high resolution and/or large number of triangles..
Since we also deal with millions of triangles (up to 10 more or less), I was looking for something else and I ended up to software renderers, compared to the hw ones, they offer free programmability but they are slower..
So I wonder if it might be possible to implement something hybrid, that is using the hardware renderer for the opaque elements and the software one (cuda/opencl) for the transparent elements and then combining the two results..
Or maybe a simple (no complex visual effect required, just position, color, simple light and properly transparency) ray-tracing algorithm in cuda/opencl might be much simpler from this point of view and give us also a lot of freedom/flexibility in the future?
I did not find anything on the net regarding this... maybe is there any particular obstacle?
I would like to know every single think/tips/idea/suggestion that you have regarding this
Ps: I also found "Single Pass Depth Peeling via CUDA Rasterizer" by Liu, but the solution from the first paper seems fair faster
http://webstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph09/content/talks/062-liu.pdf
I might suggest that you look at OpenRL, which will let you have hardware-accelerated raytracing?
My goal is to import EPS file to the app(language is C++) to create a 3D object. I am looking for some library/tool which will help me to parse EPS to a list of primitives(circles, lines, paths, etc. like in SVG) or even contours array. I've already tried the way to convert EPS to SVG first using pstoedit and uniconverter tools. But sometimes both tools make wrong conversation with data loosing. So I can not say that this way is acceptable.
Does anyone have experience in this area or have any suggestions?
This is a big project. For starters you will need a PostScript interpreter, there is no alternative to this, since the EPS can contain very nearly anything.
Rather than writing your own interpreter I would suggest you use an existing one, in fact I would suggest you use Ghostscript as it is the only GPL PS interpreter I know of.
You can write a Ghostscript containing methods which will be executed whenever the relevant operation is interpreted from the input. There is an existing (very limited, incomplete) SVG output device which would get you started.
You are going to have to handle a lot of different kinds of operations if you want a general purpose solution; For instance PostScript doesn't have a circle primitive, its curves are all Beziers, there are different kinds of line joins. You will need to consider what to do with images and presumably text (possibly discard these) and shading patterns. You will have to at least understand the various colour spaces which can be used, even if you don't plan on utilising them yourself.
Given that PostScript is inherently 2D I don't really see how you are going to construct a 3D object, but that's a different problem.
I am searching for an algorithm to determine whether realtime audio input matches one of 144 given (and comfortably distinct) phoneme-pairs.
Preferably the lowest level that does the job.
I'm developing radical / experimental musical training software for iPhone / iPad.
My musical system comprises 12 consonant phonemes and 12 vowel phonemes, demonstrated here. That makes 144 possible phoneme pairs. The student has to sing the correct phoneme pair 'laa duu bee' etc in response to visual stimulus.
I have done a lot of research into this, it looks like my best bet may be to use one of the iOS Sphinx wrappers ( iPhone App › Add voice recognition? is the best source of information I have found ). However, I can't see how I would adapt such a package, can anyone with experience using one of these technologies give a basic rundown of the steps that would be required?
Would training be necessary by the user? I would have thought not, as it is such an elementary task, compared with full language models of thousands of words and far greater and more subtle phoneme base. However, it would be acceptable (not ideal) to have the user train 12 phoneme pairs: { consonant1+vowel1, consonant2+vowel2, ..., consonant12+vowel12 }. The full 144 would be too burdensome.
Is there a simpler approach? I feel like using a fully featured continuous speech recogniser is using a sledgehammer to crack a nut. It would be far more elegant to use the minimum technology that would solve the problem.
So really I'm hunting for any open source software that recognises phonemes.
PS I need a solution which runs pretty much real-time. so even as they are singing the note, firstly it blinks on to illustrate that it picked up the phoneme pair that was sung, and then it glows to illustrate whether they are singing the correct note pitch
If you are looking for a phone-level open source recogniser, then I would recommend HTK. Very good documentation is available with this tool in the form of the HTK Book. It also contains an entire chapter dedicated to building a phone level real-time speech recogniser. From your problem statement above, it seems to me like you might be able to re-work that example into your own solution. Possible pitfalls:
Since you want to do a phone level recogniser, the data needed to train the phone models would be very high. Also, your training database should be balanced in terms of distribution of the phones.
Building a speaker-independent system would require data from more than one speaker. And lots of that too.
Since this is open-source, you should also check into the licensing info for any additional details about shipping the code. A good alternative would be to use the on-phone recorder and then have the recorded waveform sent over a data channel to a server for the recognition, pretty much something like what google does.
I have a little bit of experience with this type of signal processing, and I would say that this is probably not the type of finite question that can be answered definitively.
One thing worth noting is that although you may restrict the phonemes you are interested in, the possibility space remains the same (i.e. infinite-ish). User training might help the algorithms along a bit, but useful training takes quite a bit of time and it seems you are averse to too much of that.
Using Sphinx is probably a great start on this problem. I haven't gotten very far in the library myself, but my guess is that you'll be working with its source code yourself to get exactly what you want. (Hooray for open source!)
...using a sledgehammer to crack a nut.
I wouldn't label your problem a nut, I'd say it's more like a beast. It may be a different beast than natural language speech recognition, but it is still a beast.
All the best with your problem solving.
Not sure if this would help: check out OpenEars' LanguageModelGenerator. OpenEars uses Sphinx and other libraries.
http://www.hfink.eu/matchbox
This page links to both YouTube video demo and github source.
I'm guessing it would still be a lot of work to mould it into the shape I'm after, but is also definitely does do a lot of the work.
As the question states: how is it possible to process some dynamic videostream? By saying dynamic, i actually mean I would like to just process stuff on my screen. So the imagearray should be some sort of "continuous screenshot".
I'd like to process the video / images based on certain patterns. How would I go about this?
It would be perfect if there already was (and there probably is) existing components. I need to be able to use the location of the matches (or partial matches). A .NET component for the different requirements could also be useful I guess...
You will probably need to read up on Computer Visual before you attempt this. There is nothing really special about video that seperates it from still imgaes. The process you might want to look at is:
Acquire the data
Split the data into individual frames
Remove noise (Use a Gaussian filter)
Segment the image into the sections you want
Remove the connected components of the image
Find a way to quantize the image for comparison
Store/match the components to a database of previously found components
With this database/datastore you'll have information on matches later in the database. Do what you like with it.
As far as software goes:
Most of these algorithms are not too difficult. You can write them yourself. They do take a bit of work though.
OpenCV does a lot of the basic stuff, but it won't do everything for you
Java: JAI, JHLabs [for filters], Various other 3rd party libraries
C#: AForge.net