I see a lot of trainers in ML.Net that takes different data for "Features" column and for "Label" column it's mostly Single. Are there any trainers that get Vector as features and Vector as Label?
So that I can pass two images into trainer Input image and Output image that I want to get as result - and train network to change images? So next time it will "predict" output image from input image?
Or it must be done another way?
Related
Please help . I have images of a car, Trucks, bikes. I want to train a YOLO or object detection model on it which can output the car_owner, truck_owner, car_model, truck_model, truck_km, car_km etc. But What I know is we can label the images for YOLO taking just one part of information as label in object detection models. Like We can annotate car's image, truck's image as object within an image and it becomes labels to train the model. But I want my model to output these above mentioned information like owner, model, km etc.
How to do this?
Is there any tool to label the images with information and train YOLO or any other model? (Except Image labeler from Matlab)
I have CT images from patients and applied a CNN to those images to predict diseases. I would like to combine my clinical data with my image data in a CNN approach, is that possible? My clinical data has information like age, sex, dates, smoker, all of them are numbers, like 1 for smoker and 0 not.
Have a look at, for example, this paper where they combine features from a CNN with text data. In that paper, the CNN is already pre-trained (i.e., the CNN is essentially a featurizer), but you could clearly learn all in one go. The idea would always be to
Run the image in your input through the convolution/subsampling layers
Just before your final fully connected (decision) layer, concatenate the other features you have available
Feed all (pre-processed image and other features) into the decision layer.
So the answer is "yes, certainly", the details depend on which framework you are using.
As far as I know CNN is extremely suitable for image data, but not for other data.
A solution to your problem would be to 'color' your images with the clinical data. (In image recognition CNNs, usually an input image is split into 3 color layers: red, grey and blue. See: http://cs231n.github.io/convolutional-networks/)
Let's say your input data is a 32x32 pixel 8-bit greyscale image (so 1 color layer). I propose to add each clinical data variable as a 'color' layer. All input values in the same color layer should be the same.
Whether each layer should be the same size as the image, or if you can get away with a single pixel, I'm not sure, but at least you can treat the clinical data as an 'image' alongside the CT images.
CNN such that outputs the image with the feature added to the input image can be created?
For example, if an image of a person's face is input, outputs an image of the person's face wearing glasses.
There are several options but basically the same way that you have one input for every pixel you must have one output from every pixel in the output image.
In MLPs you must have the same neurons in the input layer than in the output layer.
In CNNs you can also have at the beginning convolutional layers and after that deconvolutional layers.
Take a look at this paper (it is awesome) to create very realistic images from other images (for example satellite and map views in google maps). It is a neural network that is trying to solve the problem and also trying to create images that other neural network is not capable to distinguish from real images (it also have the source code available):
https://phillipi.github.io/pix2pix/
To add to the answer above, another way of doing this is neural style transfer, where we feed two images to a CNN which then generates a new image combining the content from the second image and the style from the first. Check out this paper for further details, https://arxiv.org/abs/1508.06576
We could of course always use GANs to do achieve full perfection.
I trained up a very vanilla CNN using keras/theano that does a pretty good job of detecting whether a small (32X32) portion of an image contains a (relatively simple) object of type A or B (or neither). The output is an array of three numbers [prob(neither class), prob(A), prob(B)]. Now, I want to take a big image (512X680, methinks) and sweep across the image, running the trained model on each 32X32 sub-image to generate a feature map that's 480X648, at each point consisting of a 3-vector of the aforementioned probabilities. Basically, I want to use my whole trained CNN as a (nonlinear) filter with three-dimensional output. At the moment, I am cutting each 32X32 out of the image one at a time and running the model on it, then dropping the resulting 3-vectors in a big 3X480X648 array. However, this approach is very slow. Is there a faster/better way to do this?
I am extracting all images from given PDF files (containing real estate synopses) using the pdfimages tool as jpegs. Now I want to automatically distinguish between photos and other pictures, like maybe the broker's logo. How should I do this?
Is there an open tool that can distinguish between photos and clipart/line drawings etc. like google image search does?
Is there an open tool that gives me the number of colors used for a given jpeg?
I know this will bear a certain uncertainty, but that's okay.
I would look at colour distribution. The colours are likely to be densely packed or "too" evenly spread in the case of gradients. Alternatively, you could look at the frequency distribution of the image.
You can solve your problem in two steps: (1) extract some kind of information from the image and (2) train a classifier that can distinguish the two types of images:
1 - Feature Extraction
In this step you will have to write a program/function that takes a image as input and returns a numeric vector to describe its visual information. As koan points out in his answer, the color distribution contains a lot of useful information. So I would try the following measures:
* Histogram of each color channel (Red, Green, Blue), as this is a basic description of the color distribution of the image;
* Mean, standard deviation and other statistical moments of each histogram. This should give you information on how the colors are distributed in the image. For a drawing, such as logo, the color distribution should be significantly different from a photo;
* Fourier Descriptors. In a drawing, you will probably find a lot edges whereas in a photo this is not expected. With fourier descriptors, you can get this kind of information.
2 - Classification
In this step you will train some sort of classifier. Basically, get a set of images and label each one manually as a drawing or a photo. Also, use your extraction function that you wrote in step 1 to extract vectors from each image. This will be your training set. The training set will be used as input to train a classifier. As Neil N commented, a neural network may be an overkill (or maybe not?), but there are a lot of classifier that you can use (e.g. k-NN, SVM, decision trees). You don't have to implement the classifier yourself, as you can use a machine learning software such as Weka.
Finally, after you have trained your classifier, extract the vector from the image you want test. Use this vector as input to the classifier to get a prediction of whether the image is a photo or a logo.
A simpler solution is to automatically send the image to google image search with the 'similar images' setting on, and see if google sends back primarily PNG results or JPEG results.