I have 2 different inputs where the second one is the output of the first (visible in the image- note that I have not included all of the lines between the network).
I'm trying to build a network where between the first input and the second there is one or more dense layers (fully connected layers), and then between the second input and the output there is again, one or more dense layers.
Is this possible?
The network I'm trying to build:
I have started by defining my network like this:
model = Sequential()
model.add(Dense(num_neurons, input_dim=input, activation='relu'))
model.add(Dense(output, activation='softmax'))
model = Sequential()
model.add(Dense(num_neurons, input_dim=input, activation='relu'))
model.add(Dense(output, activation='softmax'))
Since the inputs are a vector I didn't know if maybe using RNN/LSTM is even possible.
Should I go for a different network design?
I created the network using Keras Functional API!
It appeared like it allow for much more diversification in the network, allowing many inputs and outputs.
Related
I'm trying to train a neural network to predict the ratings for players in FIFA 18 by easports (ratings are between 64-99). I'm using their players database (https://easports.com/fifa/ultimate-team/api/fut/item?page=1) and I've processed the data into training_x, testing_x, training_y, testing_y. Each of the training samples is a numpy array containing 7 values...the first 6 are the different stats of the player (shooting, passing, dribbling, etc) and the last value is the position of the player (which I mapped between 1-8, depending on the position), and each of the testing values is a single integer between 64-99, representing the rating of that player.
I've tried many different hyperparameters, including changing the activation functions to tanh and relu, and I've tried adding a batch normalization layer after the first dense layer (I thought that it might be useful since one of my features is very small and the other features are between 50-99), I've played around with the SGD optimizer (changed the learning rate, momentum, even tried changing the optimizer to Adam), tried different loss functions, added/removed dropout layers, and tried different regularizers for the weights of the model.
model = Sequential()
model.add(Dense(64, input_shape=(7,),
kernel_regularizer=regularizers.l2(0.01)))
//batch normalization?
model.add(Activation('sigmoid'))
model.add(Dense(64, kernel_regularizer=regularizers.l2(0.01),
activation='sigmoid'))
model.add(Dropout(0.3))
model.add(Dense(32, kernel_regularizer=regularizers.l2(0.01),
activation='sigmoid'))
model.add(Dense(1, activation='linear'))
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_absolute_error', metrics=['accuracy'],
optimizer=sgd)
model.fit(training_x, training_y, epochs=50, batch_size=128, shuffle=True)
When I train the model, the loss is always nan and the accuracy is always 0, even though I've tried adjusting a lot of different parameters. However, if I remove the last feature from my data, the position of the players, and update the input shape of the first dense layer, the model actually "trains" and ends up with around 6% accuracy no matter what parameters I change. In that case, I've found that the model only predicts 79 to be the player's rating. What am I doing inherently wrong?
You can try the following steps :
Use mean squared error loss function.
Use Adam which will help you converge faster with low learning rate like 0.0001 or 0.001. Otherwise, try using the RMSprop optimizer.
Use the default regularizers. That is none actually.
Since this is a regression task, use activation function like ReLU in all the layers except the output layer ( including the input layer ). Use linear activation in output layer.
As mentioned in the comments by #pooyan , normalize the features. See here. Even try standardizing the features. Use whichever suites the best.
I'm trying to classify 1D data with 3-layered feedforward neural network (multilayer perceptron).
Currently I have input samples (time-series) consisting of 50 data points each. I've read on many sources that number of neurons in input layer should be equal to number of data points (50 in my case), however, after experimenting with cross validation a bit, I've found that I can get slightly better average classification (with lover variation as well) performance with 25 neurons in input layer.
I'm trying to understand math behind it: does it makes any sense to have lower number of neurons than data points in input layer? Or maybe results are better just because of some errors?
Also - are there any other rules to set number of neurons in input layer?
Update - to clarify what I mean:
I use Keras w tensorflow backend for this. My model looks like this:
model = Sequential()
model.add(Dense(25, input_dim=50, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(input_data, output_data, epochs=150, batch_size=10)
predictions = model.predict(X)
rounded = [round(x[0]) for x in predictions]
print(rounded)
input_data, output_data - numpy arrays with my data points in former and corresponding value of 1 or 0 in latter.
25 is number of neurons in first layer and input_dim is number of my data points, therefore technically it works, yet I'm not sure whether it makes sense to do so or I misunderstood concept of neurons in input layer and what they do.
I want to create a neural network that will exactly fit to my sample, however at some point the learning process stops. Different optimizers like adam or sgd also don't work.
Cybenko proved that the statement from the title is true for any sigmoidal function as an activation function so I want to use either sigmoid or tanh. I want also to keep only one hidden layer.
What are tweaks that I can use? Here is my setup (python3) and results:
features = np.linspace(-0.5,.5,2000).reshape(2000,1)
target = np.sin(12*features)
hn = 100
nn = models.Sequential()
#Only one hidden layer
nn.add(layers.Dense(units=hn, activation='sigmoid',
input_shape=(features.shape[1],)))
nn.add(layers.Dense(units=1, activation='linear'))
# Compile neural network
nn.compile(loss='mse',
optimizer='RMSprop',
metrics=['mse'])
nn.fit(features,
target,
epochs=200,
verbose=2,
batch_size=5)
I know how a RNN, LSTM, neural nets,activation function works but from various available LSTM models I dont know what should I use for which data and when. I created these 5 models as a sample of different varites of LSTM models I have seen but I dont know which optimal sequence dataset should use. I have most of my confussion in the second/third lines of these models. Are model1 and model4 are same? Why is model1.add(LSTM(10, input_shape=(max_len, 1), return_sequences=False)) different from model4.add(Embedding(X_train.shape[1], 128, input_length=max_len)) . I would much appreciate If some one can explain these five models in simple english.
from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
#model1
model1 = Sequential()
model1.add(LSTM(10, input_shape=(max_len, 1), return_sequences=False))
model1.add(Dense(1, activation='sigmoid'))
model1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print model1.summary()
#model2
model2 = Sequential()
model2.add(LSTM(10, batch_input_shape=(1, 1, 1), return_sequences=False, stateful=True))
model2.add(Dense(1, activation='sigmoid'))
model2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print model2.summary()
#model3
model3 = Sequential()
model3.add(TimeDistributed(Dense(X_train.shape[1]), input_shape=(X_train.shape[1],1)))
model3.add(LSTM(10, return_sequences=False))
model3.add(Dense(1, activation='sigmoid'))
model3.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print model3.summary()
#model4
model4 = Sequential()
model4.add(Embedding(X_train.shape[1], 128, input_length=max_len))
model4.add(LSTM(10))
model4.add(Dense(1, activation='sigmoid'))
model4.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print model4.summary()
#model5
model5 = Sequential()
model5.add(Embedding(X_train.shape[1], 128, input_length=max_len))
model5.add(Bidirectional(LSTM(10)))
model5.add(Dense(1, activation='sigmoid'))
model5.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print model5.summary()
So:
First network is the best one for classification. It's simply analysing the whole sequence - and once all input steps are fed to a model - it's able to perform a decision. There are other variants of this architecture (using e.g. GlobalAveragePooling1D or max one) which are pretty similiar from a conceptual point of view.
Second network - from a design point of view is quite similar to a first architecture. What differs them is the fact that in a first approach two consequent fit and predict calls are totally independent, whereas here - the starting state for second call is the same to the last one in a first. This enables a lot of cool scenarios like e.g. varying length sequences analysis or e.g. decision making processes thanks to the fact that you could effecitively stop inference / training process - affect network or input and come back to it with actualized state.
Is the best one when you don't want to use recurrent network at all stages of your computations. Especially - when your network is big - introducing a recurrent layers is quite costly from a parameter number point of view (introducing a recurrent connection usually increases the number of parameter by a factor of at least 2). So you could apply a static network as a preprocessing stage - and then you feed results to a recurrent part. This makes training easier.
Model is a special case of case 3. Here - you have a sequence of tokens which are coded by a one-hot encoding and then transformed using Embedding. This makes the process less memory consuming.
Bidrectional network provides you an advantage of knowing at each step not only a sequence previous history - but also further steps. This is at computational cost and also you are losing the possibilty of a sequential data feed - as you need to have a full sequence when analysis is performed.
I have a question about using Keras to which I'm rather new. I'm using a convolutional neural net that feeds its results into a standard perceptron layer, which generates my output. This CNN is fed with a series of images. This is so far quite normal.
Now I like to pass a short non-image input vector directly into the last perceptron layer without sending it through all the CNN layers. How can this be done in Keras?
My code looks like this:
# last CNN layer before perceptron layer
model.add(Convolution2D(200, 2, 2, border_mode='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Dropout(0.25))
# perceptron layer
model.add(Flatten())
# here I like to add to the input from the CNN an additional vector directly
model.add(Dense(1500, W_regularizer=l2(1e-3)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
Any answers are greatly appreciated, thanks!
You didn't show which kind of model you use to me, but I assume that you initialized your model as Sequential. In a Sequential model you can only stack one layer after another - so adding a "short-cut" connection is not possible.
For this reason authors of Keras added option of building "graph" models. In this case you can build a graph (DAG) of your computations. It's a more complicated than designing a stack of layers, but still quite easy.
Check the documentation site to look for more details.
Provided your Keras's backend is Theano, you can do the following:
import theano
import numpy as np
d = Dense(1500, W_regularizer=l2(1e-3), activation='relu') # I've joined activation and dense layers, based on assumption you might be interested in post-activation values
model.add(d)
model.add(Dropout(0.5))
model.add(Dense(1))
c = theano.function([d.get_input(train=False)], d.get_output(train=False))
layer_input_data = np.random.random((1,20000)).astype('float32') # refer to d.input_shape to get proper dimensions of layer's input, in my case it was (None, 20000)
o = c(layer_input_data)
The answer here works. It is more high level and works also for the tensorflow backend:
input_1 = Input(input_shape)
input_2 = Input(input_shape)
merge = merge([input_1, input_2], mode="concat") # could also to "sum", "dot", etc.
hidden = Dense(hidden_dims)(merge)
classify = Dense(output_dims, activation="softmax")(hidden)
model = Model(input=[input_1, input_2], output=hidden)