Continous Bag of Words - machine-learning

I have a question related to the continous Bag of Words model.
If I have a vocabulary size of 1000, a window size of 2, and the number of nodes in the hidden layer is 100, what is the size of the learnt word embedding of one word?

Related

How to decide on number of units in RNN Keras layer? [duplicate]

This question already exists:
How to decide on number of neurons in RNN?
Closed 2 years ago.
I am new to RNN and trying to understand them. My question is: Is the number of neurons dependent on the size of the sequence and the number of time steps? My main understanding is since it takes a sequence of input then the number of neurons should be the same as the size of the sequence. if we have 10 time-steps and thus 10 different inputs then, we should have 10 neurons. If no then how do we feed or sequence to the neurons if we have a sequence of size 20 and 10 neurons only?
I will answer your first question.
Generally - yes, the number of neurons is dependent on the size of the sequence: the more neurons - the better prediction of longer sequences. But even one neuron may give you a perfect prediction if your sequence is simple (e.g. if your sequence is [1, 1, 1, 1, 1]).

Finding Particular word in a webpage using HOG features and Sliding Window

I want to find the occurrence of a particular word in any webpage given as a input.
I used Pyramid-Sliding window , where I generated HOG(Histogram of Gradients) features for all the sliding windows.
For now , I am comparing the HOG features of all windows with the HOG features of the word I want to extract.
For comparison of the two HOG feature vectors, I am just taking summation(vector1(i) - vector2(i)) for all i.
However, the results are below expectations.
My query is that can there be a better comparison system for comparing the HOG-features of each window with that of the word I want to find.
Or should I train a classifier like SVM , to classify the HOG-features of a window.
For training the classifier, I can have max 100-200 elements for the word I want to find in my data-set. And since for SVM , its better to have equal number of true and false data elements in the data-set , how to restrict the non word representations(false elements) to 100-200.
For non-word data elements in the training set, I have :
1. ICDAR-2003 (this word data-set do not contain the word I want to extract)
2. CIFAR image data set
The reason I am not extracting/finding this word in the html code, is because the word can occur in an image also.
Moreover, since the word I want to find is fixed, how many images of the word should I have in the data-set.
If you have fixed font and looking only for particular word, here is simple workaround:
https://stackoverflow.com/a/9647509/8682088
You have to extract word box, resize it to for example 40x10 pixels. Grayscale pixel values could be your feature vector. Then you could train your SVM. It is primitive, but suprisingly effective.
It work perfectly fine with fixed font and simple symbols.

Object Localization through CNN

I am new to deep learning and tensor flow and I am trying to train a CNN at localizing digits in the Street View House Numbers data set. To this end I have an input set of 32x32 images and, since I want to recognize up to 5 digits, I am using as labels vectors of 20 elements like this
[top_x_digit1,top_y_digit1,width_digit1,height_digit1,top_x_digit2, etc..]
0,0,0,0 when there is no digit
As far as I understand, after (let me say) 3 layers of convolution and pooling I can add 5 (parallel) fully connected layers aimed at extracting each the box features of a different digit (when present, 0 0 0 0 otherwise).
is my approach correct?

Scikit learn multilayer neural network

As per the documentation provided by Scikit learn
hidden_layer_sizes : tuple, length = n_layers - 2, default (100,)
I have little doubt.
In my code what I have configured is
MLPClassifier(algorithm='l-bfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
so what does 5 and 2 indicates?
What I understand is, 5 is the numbers of hidden layers, but then what is 2?
Ref - http://scikit-learn.org/dev/modules/generated/sklearn.neural_network.MLPClassifier.html#
From the link you provided, in parameter table, hidden_layer_sizes row:
The ith element represents the number of neurons in the ith hidden
layer
Which means that you will have len(hidden_layer_sizes) hidden layers, and, each hidden layer i will have hidden_layer_sizes[i] neurons.
In your case, (5, 2) means:
1rst hidden layer has 5 neurons
2nd hidden layer has 2 neurons
So the number of hidden layers is implicitely set
Some details that I found online concerning the architecture and the units of the input, hidden and output layers in sklearn.
The number of input units will be the number of features
For multiclass classification the number of output units will be the number of labels
Try a single hidden layer, or if more than one then each hidden layer should have the same number of units
The more units in a hidden layer the better, try the same as the number of input features up to twice or even three or four times that

Neural Networks (input and output layers)

When dealing with muticlass classification, is it always that the number of nodes (which are vectors) in the input layer excluding bias is the same as the number of nodes in the output layer?
No. The input layer ingests the features. The output layer makes predictions for classes. The number of features and classes does not need to be the same; it also depends on how exactly you model the multiple classes output.
Lars Kotthoff is right. However, when you are using an artificial neural network to build an autoencoder, you will want to have the same number of input and output nodes, and you will want the output nodes to learn the values of the input nodes.
Nope,
Usually number of input unites equals to number of features you are going use for training the NN classifier.
Size of the output layer equals to number of classes in the dataset. Further, if dataset has two classes only just one output unit is enough for discriminating these two classes.
The ANN output layer has a node for each class: if you have 3 classes, you use 3 nodes. The input layer (often called a feature vector) has a node for each feature used for prediction and usually an extra bias node. You usually need only 1 hidden layer, and discerning its ideal size tricky.
Having too many hidden layer nodes can result in overfitting and slow training. Having too few hidden layer nodes can result in underfitting (overgeneralizing).
Here are a few general guidelines (source) to start with:
The number of hidden neurons should be between the size of the input layer and the size of the output layer.
The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
The number of hidden neurons should be less than twice the size of the input layer.
If you have 3 classes and an input vector of 30 features, you can start with a hidden layer of around 23 nodes. Add and remove nodes from this layer during training to reduce your error, while testing against validation data to prevent overfitting.

Resources