To my understanding the canonical generative models (GAN, VAE) sample data from what is basically random noise, or more specifically for the latter from a learned latent distribution which we have no real control of. The generative process is usually to take a value randomly, put it into the model, and get data back.
But I want this random value to have a meaning and to be constrained by my needs.
I work in particle physics, but for analogy let's say that my data is a collection of photos of bullet impact on bulletproof glass (reality: particle interaction in a detector)
The glass shattering will of course depend from the bullet type, but also from the bullet velocity and angle of incidence. One categorical variable and two continuous ones.
I want to generate more photos, but in the new photo I want to control the angle of the bullet and its velocity. In other words, I want to tell the model: "give me impacts of a .22 cal striking at 900 m/s and 10° incidence" and get in return photos that could have resulted from such an impact.
How do I do that?
I realise there may be established techniques but I am lacking the keywords or names to search for them. Any starter is much welcome
Related
I have been experimenting with object detection recently, using Faster R-CNN and YOLOv7 to train models on pre-existing datasets.
Using a UNO card dataset I quite accurately detected the type of UNO cards, based on the symbol in the top left corner. I used an object detection approach, with UNO cards only being categorized into 14 classes.
Based on that, I am wondering what the best approach would be to enhance the model to use for other and more comprehensive card games.
Thinking of card games like Munchkin for example, which has 1000s of different cards. For card games like this, object detection might not be the best approach having 1000s of different classes to consider.
The two different approaches I am considering:
Using object detection, create x many classes as there are different playing cards in the game, training the model to detect every single card individually
or
Using object detection, use playing cards to train the model to detect the playing card itself, then using the detected playing card as input for an image classification algorithm
For me there are pros and cons for both methods:
The first approach might be much more accurate, as it detects each card individually. On the other hand, it seems to me that it needs considerably more classes and data to feed into those classes. It also might be difficult to expand the model with more unique cards, as you would have to rerun the model every time.
The second approach might not be as accurate, as it might not only detect playing cards but also identify other objects as playing cards. On the flip side, it seems to me that it is much easier to expand the model with more unique cards.
What might be the best approach here? Do you have a different approach to this, which might be more efficient?
Between these two options, I would prefer to go with second option. The pros overwhelm cons in my point of view. Much more easy to scale thats for sure and if you want to expand this model to other card games it is a valuable point. But I would also suggest to use just plain image classification. I am not sure if it can outperform second option (I think it can't) but can be faster and if its still good why not give it a go. A standard multi label CNN is worth to try I think.
Context: I have partial images of size view of different types of vehicles in my data set ( Partial images because of limited Field Of View of my camera lens ). These partial images cover more than half the vehicle and can be considered as good representative images of the vehicle. The vehicle categories are car, bus, trucks. I always get a wheel of the vehicle in these images and because I am capturing these images during different parts of the day the colour intensity of the wheels vary throughout the day. However a wheel is definitely present in all the images.
Question: I wanted to know if presence of a object in all the images of a data set not logically useful for classification will affect the CNN in any way. Basically I wanted to know before training the CNN should I mask the object i.e black it out in all the images or just let it be there.
A CNN creates a hierarchical decomposition of the image into combinations of various discriminatory patterns. These patterns are learnt during training to find those that separate the classes well.
If an object is present in every image, it is likely that it is not needed to separate the classes and won't be learnt. If there is some variation on the onject that is class dependant, then maybe it will be used. It is really difficult to know what features are important beforehand. Maybe busses have shinier wheels than other cars, and this is something you have not noticed, and thus having the wheel in the image is beneficial.
If you have inadvertently introduced some class specific variation, this can cause a problem for later classification. For example, if you only took photos of busses at night, the network might learn night = bus and when you show it a photo of a bus during the day it won't classify correctly.
However, using dropout in the network forces it to learn multiple features for classification, and not just rely on one. So if there is variation, this might not have as big an impact.
I would use the images without blanking anything out. Unless it is something simple such as background removal of particles etc., finding and blacking out the object adds another layer of complexity. You can test if the wheels make a big difference by training the network on the normal images, then classifying a few training examples with the object blacked out and seeing if the class probabilities change.
Focus you energy on doing good data augmentation, that is where you will get the most gains.
You can see an example of which features are learnt on MNIST in this paper.
I have a large dataset of vehicle images with the ground truth of their lengths (Over 100k samples). Is it possible to train a deep network to estimate vehicle length ?
I haven't seen any papers related to estimating object size using deep neural network.
[Update: I didn't notice computer-vision tag in the question, so my original answer was for a different question]:
Current convolutional neural networks are pretty good at identifying vehicle model from raw pixels. The technique is called transfer learning: take a general pre-trained model, such as VGGNet or AlexNet, and fine tune it on a vehicle data set. For example, here's a report of CS 231n course project that does exactly this (note: done by students, in 2015). No wonder there are apps out there already that do it in smartphone.
So it's more or less a solved problem. Once you know the model type, it's easy to look up it's size / length.
But if you're asking a more general question, when the vehicle isn't standard (e.g. has a trailer, or somehow modified), this is much more difficult, even for a human being. A slight change in perspective can result in significant error. Not to mention that some parts of the vehicle may be simply not visible. So the answer to this question is no.
Original answer (assumes the data is a table of general vehicle features, not the picture):
I don't see any difference between vehicle size prediction and, for instance, house price prediction. The process is the same (in the simplest setting): the model learns correlations between features and targets from the training data and then is able to predict the values for unseen data.
If you have good input features and big enough training set (100k will do),
you probably don't even need a deep network for this. In many cases that I've seen, a simplest linear regression produces very reasonable predictions, plus it can be trained almost instantly. So, in general, the answer is "yes", but it boils down to what particular data (features) you have.
You may do this under some strict conditions.
A brief introduction to Computer Vision / Multi-View Geometry:
Based on the basics of the Multi-View Geometry, the main problem of identifying of the object size is finding the conversion function from camera view to real world coordinates. By applying different conditions (i.e. capturing many sequential images - video / SfM -, taking same object's picture from different angles), we can estimate this conversion function. Hence, this is completely dependent on camera parameters like focal length, pixel width / height, distortion etc.
As soon as we have the camera to real world conversion function, it is super easy to calculate camera to point distance, hence the object's size.
So, based on your current task, you need to supply
image
camera's intrinsic parameters
(optionally) camera's extrinsic parameters
and get the output that you desire hopefully.
Alternatively, if you can fix the camera (same model, same intrinsic / extrinsic parameters), you can directly find the correlation between same camera's image and distance / object sizes just by giving the image as the only input. However, the NN will most probably will not work for different cameras.
I am wanting to count the number of cars in aerial images of parking lots. After some research I believe that Haar Cascade Classifiers might be an option for this. An example of an image I will be using would be something similar to a zoomed in image of a parking lot from Google Maps.
My current plan to accomplish this is to train a custom Haar Classifier using cars that I crop out of images in only one orientation (up and down), and then attempt recognition multiple times while rotating the image in 15 degree increments. My specific questions are:
Is using a Haar Classifier a good approach here or is there something better?
Assuming this is a good approach, when cropping cars from larger images for training data would it be better to crop a larger area that could possibly contain small portions of cars in adjacent parking spaces (although some training images would obviously include solo cars, cars with only one car next to them, etc.) or would it be best to crop the cars as close to their outline as possible?
Again assuming I am taking this approach, how could I avoid double counting cars? If a car was recognized in one orientation, I don't want it to be counted again. Is there some way that I could mark a car as counted and have it ignored?
I think in your case I would not go for Haar features, you should search for something that is rotation invariant.
I would recommend to approach this task in the following order:
Create a solid training / testing data set and have a good look into papers about getting good negative samples. In my experience good negative samples have a great deal of influence on the resulting quality of your classifier. It makes your life a lot easier if all your samples are of the same image size. Add different types of negative samples, half cars, just pavement, grass, trees, people etc...
Before starting your search for a classifier make sure that you have your evaluation pipeline in order, do a 10 fold cross evaluation with the simplest Haar classifier possible. Now you have a baseline. Try to keep the software for all features you tested working in caseou find out that your data set needs adjustment. Ideally you can just execute a script and rerun your whole evaluation on the new data set automatically.
The problem of counting cars multiple times will not be of such importance when you can find a feature that is rotation invariant. Still non maximum suppression will be in order becaus you might not get a good recognition with simple thresholding.
As a tip, you might consider HOG features, I did have some good results on cars with them.
I am reading the Tom Mitchell's Machine Learning book, the first chapter.
What I want do is to write the program to play checker with itself, and learn to win at the end. My question is about the credit assignment of a non-terminal board position it encounters. Maybe we can set the value using the linear combination of its feature and randomly weights, how to updates it with LMS rules? Because we don't have the training samples apart from ending states.
I am not sure whether I state my question clearly although I tried to.
I haven't read that specific book, but my approach would be the following. Suppose that White wins. Then, every position White passed through should receive positive credit, while every position Black passed through should receive negative credit. If you iterate this reasoning, whenever you have a set of moves making up a game, you should add some amount of score to all positions from the victor and remove some amount of score from all positions from the loser. You do this for a bunch of computer vs. computer games.
You now have a data set made up of a bunch of checker positions and respective scores. You can now compute features over those positions and train your favorite regressor, such as LMS.
An improvement of this approach would be to train the regressor, then make some more games where each move is randomly drawn according to the predicted score of that move (i.e. moves which lead to positions with higher scores have higher probability). Then you update those scores and re-train the regressor, etc.