How to fuse an additional feature in training deep neural networks? - machine-learning

I am training a Convolutional Neural Network (CNN) to classify Spectrogram images (frequency over time). These Spectrograms were created from some signals on specific times, therefore the time of collection is also an important feature. How can I use this collection-time as a feature while training a CNN with Spectrogram images?
One thing that I guess would be possible, is to add it as an extra node in the last Dense layer, but I don't know which Keras functionality would allow me to do so.

Related

Facial recognition and classifying unknowns with neural networks

As far as I understand, neural networks aren't good at classifying 'unknowns', i.e. items that do not belong to a learned class. But how do face detection/recognition approaches usually determine that no face is detected/recognised in a region? Is the predicted probability somehow thresholded?
Summary
It is true that neural networks are inherently not good at classifying 'unknowns' because they tend to overfit to the data that they have been trained on, if the underlying structure of the neural network is complex enough. However, there are multiple ways to go about reducing the affects of overfitting. For example, one technique that is used for this is called dropout. Another example can be batch normalization. Despite these techniques, the best way to reduce the affects of overfitting is to use more data.
For the facial recognition example that you have given above, it is common that the models that have been trained have 'seen' a huge amount of data. This means that there are very few 'unknowns' and even if there are, the neural network has learned how to tell if there are facial features present or not. This is because certain structures of neural networks are really good at telling if there is a pattern of features present in the input data. This helps the neural networks to learn if the image that is being input has certain features/patterns in it or not. If the these features are found then the input data is classified as face otherwise it is not.

Fine tuning of pre-trained convolutional neural network [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
As I read and searched about the fine tuning of pre-trained network, it is done in following two steps (in short):
freeze the hidden layer and unfreeze the fully connected layer and trained.
unfreeze both the layers and again train.
My questions are:
Whether it is enough to perform only first step?
If I preform only first step, is it not same as network as a feature extractor method?
(The network as a feature extractor method is, to extract the feature using pre-trained network and classify it using tradition machine learning classification algorithm).
If you want more information to clarify the question, please let me know.
There are some issues with your question...
First, you clearly imply a network with only 2 layers, which is rather (very) far from the way fine-tuning is actually used in practice nowadays.
Second, what exactly do you mean by "enough" in your first question (enough for what)?
In fact, there is enough overlapping between the notions of pre-trained models, feature extractors, and fine-tuning, and different people may even use the involved terms in not exactly the same ways. One approach, adopted by the Stanford CNNs for Visual Recognition course, is to consider all these as special cases of something more general called transfer learning; here is a useful excerpt from the respective section of the aforementioned course, which arguably addresses the spirit (if not the letter) of your questions:
The three major Transfer Learning scenarios look as follows:
ConvNet as fixed feature extractor. Take a ConvNet pretrained on ImageNet, remove the last fully-connected layer (this layer’s outputs are the 1000 class scores for a different task like ImageNet), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset. In an AlexNet, this would compute a 4096-D vector for every image that contains the activations of the hidden layer immediately before the classifier. We call these features CNN codes. It is important for performance that these codes are ReLUd (i.e. thresholded at zero) if they were also thresholded during the training of the ConvNet on ImageNet (as is usually the case). Once you extract the 4096-D codes for all images, train a linear classifier (e.g. Linear SVM or Softmax classifier) for the new dataset.
Fine-tuning the ConvNet. The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation. It is possible to fine-tune all the layers of the ConvNet, or it’s possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset. In case of ImageNet for example, which contains many dog breeds, a significant portion of the representational power of the ConvNet may be devoted to features that are specific to differentiating between dog breeds.
Pretrained models. Since modern ConvNets take 2-3 weeks to train across multiple GPUs on ImageNet, it is common to see people release their final ConvNet checkpoints for the benefit of others who can use the networks for fine-tuning. For example, the Caffe library has a Model Zoo where people share their network weights.
When and how to fine-tune? How do you decide what type of transfer learning you should perform on a new dataset? This is a function of several factors, but the two most important ones are the size of the new dataset (small or big), and its similarity to the original dataset (e.g. ImageNet-like in terms of the content of images and the classes, or very different, such as microscope images). Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers, here are some common rules of thumb for navigating the 4 major scenarios:
New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.
New dataset is large and similar to the original dataset. Since we have more data, we can have more confidence that we won’t overfit if we were to try to fine-tune through the full network.
New dataset is small but very different from the original dataset. Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.
New dataset is large and very different from the original dataset. Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.

Binary Classification with Neural Networks?

I have a dataset of the order of MxN. I want to perform a binary classifcation on this dataset using neural networks. I was looking into Recurrent Neural Networks. Although, LSTM's can be used for AutoEncoders, I am not sure if they can be used for classification (I am trying to do a binary classification). I am very new to neural networks and deep learning models and i am not really sure if there is a way of achieving binary classification with neural networks. I tried Bernouli RBM on my dataset. I am not sure how to use this model to perform classification. I also found out Pipeline(). Again, I am not sure how to achieve my goal.
Any help would be greatly appreciated.
Ok, something doesn't stack up. If you have unlabelled data and you want to classify it you must take a look at K-Means (http://scikit-learn.org/stable/modules/clustering.html#k-means).
Regarding LSTMs classification: You run your input through the RNN layers and take the last output and feed it into some Conv / Fully-connected layers to take care of classification as you know it.

Confusion in machine learning concept for the object detection when using Aggregate Channel Features

I have one confusion in my mind regarding the machine learning concept for the Object detection.
The two main modules in Object detection is Proposal extraction and detection.
For the Proposal Extraction Module:
I want to use Aggregate Channel Features (ACF) for the proposal extraction. this algorithm needs training (positive and negative samples)and then we can do testing.
For Object Detection Module:
lets say I am using Convolution Neural Network.
Now my question is, Can I train the ACF first with 80% samples from the dataset, test its performance and make it ready to put in the pipeline. Then I split data set again, lets say now I choose 40% for the training the CNN architecture. This 40% dataset will first go to trained ACF to extract the proposals and these proposals are then trained according to their labels for the object detection.
Is this concept is right?

Image classification using Convolutional neural network

I'm trying to classify hotel image data using Convolutional neural network..
Below are some highlights:
Image preprocessing:
converting to gray-scale
resizing all images to same resolution
normalizing image data
finding pca components
Convolutional neural network:
Input- 32*32
convolution- 16 filters, 3*3 filter size
pooling- 2*2 filter size
dropout- dropping with 0.5 probability
fully connected- 256 units
dropout- dropping with 0.5 probability
output- 8 classes
Libraries used:
Lasagne
nolearn
But, I'm getting less accuracy on test data which is around 28% only.
Any possible reason for such less accuracy? Any suggested improvement?
Thanks in advance.
There are several possible reasons for low accuracy on test data, so without more information and a healthy amount of experimentation, it will be impossible to provide a concrete answer. Having said that, there are a few points worth mentioning:
As #lejlot mentioned in the comments, the PCA pre-processing step is suspicious. The fundamental CNN architecture is designed to require minimal pre-processing, and it's crucial that the basic structure of the image remains intact. This is because CNNs need to be able to find useful, spatially-local features.
For detecting complex objects from image data, it's likely that you'll benefit from more convolutional layers. Chances are, given the simple architecture you've described, that it simply doesn't possess the necessary expressiveness to handle the classification task.
Also, you mention you apply dropout after the convolutional layer. In general, the research I've seen indicates that dropout is not particularly effective on convolutional layers. I personally would recommend removing it to see if it has any impact. If you do wind up needing regularization on your convolutional layers, (which in my experience is often unnecessary since the shared kernels often already act as a powerful regularizer), you might consider stochastic pooling.
Among the most important tips I can give is to build a solid mechanism for measuring the quality of the model and then experiment. Try modifying the architecture and then tuning hyper-parameters to see what yields the best results. In particular, make sure to monitor training loss vs. validation loss so that you can identify when the model begins overfitting.
After 2012 Imagenet, all convolutional neural networks which performs good(state of the art) are adding more convolutional neural network, they even use zero padding to increase the convolutional neural network.
Increase the number of convolutional neural network.
Some says that dropout is not that effective on CNN, however it is not bad to use, but
You should lower the dropout value, you should try it(May be 0.2).
Data should be analysed. If it is low,
You should use data augmentation techniques.
If you have more data in one of the labels,
You are stuck with the imbalanced data problem. But you should not consider it for now.
You can
Fine-Tune from VGG-Net or some other CNN's should be considered.
Also, don't convert to grayscale, after image-to-array transformation, you should just divide 225.
I think that you learned CNN from some tutorial(MNIST) and you think that you should turn it to grayscale.

Resources