I am wondering if its possible to have multiple softmax layers for your output of your neural network. I have survey data where for each question, there are four unique possible answers. A lot of these answers do not overlap, such as having one answer set be x1, x2, x3, x4, and another answer set is x1,x5,x6,x7 where xn is a unique response. Can I create a softmax layer for each of the questions, so that each question gets its own softmax layer that provides a probability weighting to a question's answer set?
I would like to be able to eventually input new questions with new answer sets into my model.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I would like to understand the difference between Cost function and Activation function in a machine learning problems.
Can you please help me understand the difference?
A cost function is a measure of error between what value your model predicts and what the value actually is. For example, say we wish to predict the value yi for data point xi. Let fθ(xi) represent the prediction or output of some arbitrary model for the point xi with parameters θ. The cost function is the sum of (yi−fθ(xi))2 (this is only an example it could be the absolute value over the square). Training the hypothetical model we stated above would be the process of finding the θ that minimizes this sum.
An activation function transforms the shape/representation of the in the model. A simple example could be max(0,x), a function which outputs 0 if the input x is negative or x if the input x is positive. This function is known as the “Rectified Linear Unit” (ReLU) activation function. These representations are essential for making high-dimensional data linearly separable, which is one of the many uses of a neural network. The choice of these functions depends on your case if you need a custom model also the kind of layer (hidden / output) or the model architecture.
So for an application I'm making I'm using tf.keras.models.Sequential. I know that there are linear and multilinear regression models for machine learning. In the documentation of Sequential is said that the model is a linear stack of layers. Is that equal to multilinear regression? The only explaination of linear stack of layers I could find was this question on Stackoverflow.
def trainModel(bow,unitlabels,units):
x_train = np.array(bow)
print("X_train: ", x_train)
y_train = np.array(unitlabels)
print("Y_train: ", y_train)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(256, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(len(units), activation=tf.nn.softmax)])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=50)
return model
you are confusing two things very important here. One is the model and the other is the model of the model.
The model of the model is indeed a linear one because it follows a direct line (straightforward) from beginning till end.
the model itself is not linear: The relu activation is here to make sure that the solutions are not linear.
the linear stack is not a linear regression nor a multilinear one. The linear stack is not a ML term here but the english one to say straightforward.
tell me if i misunderstood the question in any regard.
In the documentation of Sequential is said that the model is a linear stack of layers. Is that equal to multilinear regression?
Assuming you mean a regression with multiple variables, no.
tf.keras.models.Sequential() defines how the layers in your model are connecting, specifically in this case it means they are fully connected (every output from the first layer is connected as an input to every neuron in the next layer). The term linear is used to mean that there is no funny business going on, e.g. recurrency (connections can go backwards) or residual connections (connections can skip layers).
For context, a regression with multiple variables is comparable to a single layered network with a single neuron with multiple inputs and no transfer function.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed last year.
Improve this question
A simple example: Given an input sequence, I want the neural network to output the median of the sequence. The problem is, if a neural network learnt to compute the median of n inputs, how can it compute the median of even more inputs? I know that recurrent neural networks can learn functions like max and parity over a sequence, but computing these functions only requires constant memory. What if the memory requirement grows with the input size like computing the median?
This is a follow up question on How are neural networks used when the number of inputs could be variable?.
One idea I had is the following: treating each weight as a function of the number of inputs instead of a fixed value. So a weight may have many parameters that define a function, and we train these parameters. For example, if we want the neural network to compute the average of n inputs, we would like each weight function behaves like 1/n. Again, average per se can be computed using recurrent neural networks or hidden markov model, but I was hoping this kind of approaches can be generalized to solve certain problems where memory requirement grows.
If a neural network learnt to compute the median of n inputs, how can it compute the median of even more inputs?
First of all, you should understand the use of a neural network. We, generally use the neural network in problems where a mathematical solution is not possible. In this problem, use of NN is not significant/ unadvisable.
There are other problems of such nature, like forecasting, in which continuous data arrives over time.
One solution to such problem can be Hidden Markov Model (HMM). But again, such models depends on the correlation between input over a period of time. So This model is not efficient for problems where the input is completely random.
So, If input is completely random and memory requirement grows
There is nothing much you can do about it, one possible solution could be growing your memory size.
Just remember one thing, NN and similar models of machine learning aims to extract meaningful information from the data. if data is just some random values then all models will generate some random output.
One more idea: some data transformation. Let have N big enough that always bigger than n. We make a net with 2*N inputs. First N inputs are for data. If n less then N, then rest inputs set to 0. Last N inputs are intended for specifying which numbers are useful. Thus 1 is data, 0 is not data. As follows in Matlab notation: if v is an input, and it is a vector of length 2*N, then we put into v(1:n) our original data. After that, we put to v(n+1:N) zeros. Then put to v(N+1:N+n) ones, and then put V(N+n+1:2*N) zeros. It is just an idea, which I have not checked. If you are interested in the application of neural networks, take a look at the example of how we have chosen an appropriate machine learning algorithm to classify EEG signals for BCI.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Neural Nets sum up weights, but RBMs... multiply weights into a probability? So is an RBM kind of like a bidirectional neural net that multiplies it's weights instead of adding them?
First off, a restricted Boltzmann machine is a type of neural network, so there is no difference between a NN and an RBM. I think by NN you really mean the traditional feedforward neural network. Also, note that neither feedforward neural networks nor RBMs are considered fully connected networks. The terminology "fully connected" comes from graph theory and means that each node is connected to every other node, which is clearly not the case here. The layers are, however, fully connected from one to another.
Traditional feedforward neural networks
The traditional FNN model is a supervised learning algorithm for modelling data. To train this network one needs a dataset containing labelled instances. One will present each item to the network, consecutively compute activations for each layer up the network until the output layer is reached and then compare the output with the target output (the label). One then typically uses the backpropagation algorithm to obtain the gradient of the weights and biases for each unit in order to update these parameters via gradient descent. Typically, either the entire dataset or batches of it are passed through the network in one go, and the parameter updates are computed with respect to them all.
RBMs
The RBM model is a version of the Boltzmann machine model that has been restricted for computational efficiency. RBMs are BMs without connections between units in the same layer. This isn't the place to go into detail but I will point you to some external resources. There are a number of variations to the algorithm and the explanations online do not make this clear, nor are very useful for the inexperienced.
Neural networks are algorithms for fitting models to datasets. In an RBM, we attempt to do this using 2 layers of nodes: a "visible layer" that we set to the input and a "hidden layer" that we use to model the input layer. Crucially, the learning process is unsupervised. Training involves using the hidden layer to reconstruct the visible layer and updating the weights and biases using the difference between the node states before and after reconstruction (I have very much simplified this explanation; for more information note that this training algorithm is called contrastive divergence (CD)). Also note that neurons are activated probabilistically in this model. The connections between each layer are bidirectional, thus the network forms a bipartite graph.
Importantly, RBMs do not produce an output in the same way FNNs do. As of this, they are often used to train a network before an output layer is added and another algorithm, such as an autoencoder, is used with the weights learned by the RBM.
Check out these resources:
http://blog.echen.me/2011/07/18/introduction-to-restricted-boltzmann-machines/
http://halfbakedmaker.org/?p=11748
http://deeplearning.net/tutorial/rbm.html
https://en.wikipedia.org/wiki/Restricted_Boltzmann_Machine
http://image.diku.dk/igel/paper/AItRBM-proof.pdf
http://makandracards.com/mark/6817-bm-rbm-for-beginners
In general
The performance of any network depends on its parameters and design choices as well as the problem to which it is applied. RBMs and FNNs are suitable for different kinds issues.
I highly recommend Geoffrey Hinton's course "Neural Networks for Machine Learning" on Coursera - the course has taken place but the lectures are available for free.
RBM does not "multiply its weights", on the level of one neuron it does exactly the same what a neuron in the "typical" neural net, the only difference is its stochastic nature.
This question already has answers here:
How does a back-propagation training algorithm work?
(4 answers)
Closed 3 years ago.
I decided to make genetic algorithm to train neural networks. They will develop through inheritance, where one (of the many) variable gene should be transfer function.
So, I need to go more into depth of mathematics, and it is really time consumming.
I have for example three variants of transfer function gene.
1)log sigmoid function
2)tan sigmoid function
3)gaussian function
One of the features of the transfer function gene should be that it can modify parameters of function to get different shape of function.
And now, the problem that I am not cappable to solve yet:
I have error at output of neural network, and how to transfer it on the weights throug different functions with different parameters? According to my research I think it has something to do with derivatives and gradient descent.
I am high level math noob. Can someone explain me on simple example how to propagate error back on weights through parametrized (for exapmle) sigmoid function?
EDIT
I am still doing research, and now I am not sure if I not misunderstand backpropagation. I found this doc
http://www.google.cz/url?sa=t&rct=j&q=backpropagation+algorithm+sigmoid+examples&source=web&cd=10&ved=0CHwQFjAJ&url=http%3A%2F%2Fwww4.rgu.ac.uk%2Ffiles%2Fchapter3%2520-%2520bp.pdf&ei=ZF9CT-7PIsak4gTRypiiCA&usg=AFQjCNGWZjabH5ALbDLgSOBak-BTRGmS3g
and they have some example of computing weights, where they do NOT involve transfer function into weight adjusting.
So is it not neccessary to involve transfer functions into weight adjusting?
Backpropagation does indeed have something to do with derivatives and gradient descents.
I don't think there is any shortcut to truly understanding the math, but this may help-- I wrote it for someone else with basically the same question, and should at least explain at a high level what's going on, and why.
How does a back-propagation training algorithm work?