Fusing Multi-Level Sequential Model By the Flatten Layers

Fusing Multi-Level Sequential Model By the Flatten Layers - machine-learning

I am trying to have two sequential models with one input and one output, I am not sure if this is possible as I've been researching Keras etc. and all that I seem to be investigating is how to do two and three inputs with multiple outputs which is what I DO NOT WANT.
To explain, from the 1st sequential model level output AT THE FLATTEN LAYER, I am trying to convert the activations back to a cube shape to be passed as input to the 2nd level sequential model making this a 2 level multi-level structure. I have my diagram to assist with the understanding of my thoughts, please review such in the link.
My apologies in advance for the lack of understanding and explanations, I am not sure of the terminology for such a model or approach hence my diagram to clarify my thoughts at this stage. All I came up with is a multi-level sequential model and its fusuion mechanisms. I searched for examples of such but thus far my efforts were unsuccessful.
THANK YOU'ssssss in advance for any assistance given.
enter image description here
I was following this code but it leads to a the fusion of two inputs. I would like to use the last flattened layer of the 1st level as input for the 2nd level, then process onwards towards classification. I am not sure if this is possible please assist?
model = Sequential()
model.add(Conv3D(32, kernel_size=(3, 3, 3),activation='relu',
input_shape=input_shape))
model.add(Conv3D(64, (3, 3, 3), activation='relu'))
model.add(MaxPooling3D(pool_size=(2, 2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
print(model.output_shape)
# The additional data (the coordinates x,y,z)
extra = Sequential()
extra.add(Activation('sigmoid', input_shape=(3,)))
print(extra.output_shape)
merged = Concatenate([model, extra])
# New model should encompass the outputs of the convolutional
network and the coordinates that have been merged.
# But how?
new_model = Sequential()
new_model.add(Dense(128, activation='relu'))
new_model.add(Dropout(0.8))
new_model.add(Dense(32, activation='sigmoid'))
new_model.add(Dense(num_classes, activation='softmax'))
new_model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])

Related

Confusion about domain adaptaion method: "Encoder" versus "prediction head"

In regard to this article:
Chen, D., Zhou, R., Pan, Y., & Liu, F. (2022). A Simple Baseline for Adversarial Domain Adaptation-based Unsupervised Flood Forecasting. arXiv preprint arXiv:2206.08105.
The authors describe two models. The first model is a 1D-CNN "encoder" with three layers. The second model is a "prediction head". It is also a 1D-CNN with three layers.
How to implement this?
For example, I'd start by creating two Sequential() models, each with three Conv1D layers and number of filters and kernel size as specified. Next step would be to train the encoder models on the source and target datasets. But what comes next?
For example:
# Encoder model
encoder_model = Sequential()
encoder_model.add(Conv1D(filters=30, kernel_size=2, activation='relu',
input_shape=(n_timesteps, n_features)))
encoder_model.add(Dropout(0.2))
encoder_model.add(Conv1D(filters=30, kernel_size=2, activation='relu'))
encoder_model.add(Dropout(0.2))
encoder_model.add(Conv1D(filters=30, kernel_size=2, activation='relu'))
encoder_model.add(Dropout(0.2))
# Prediction head model
ph_model = Sequential()
ph_model.add(Conv1D(filters=36, kernel_size=2, activation='relu',
input_shape=(n_timesteps, n_features)))
ph_model.add(Conv1D(filters=36, kernel_size=2, activation='relu'))
ph_model.add(Conv1D(filters=1, kernel_size=3))
There is also a "residual connection" in the prediction head. But how to add that?
The article includes this diagram:
How would this look when programmed, for example, in keras?

How to forecast one output with multiple features by LSTM model?

I am playing with some stocks timeseries data and trying to predict the trend with multivariate features. Below is the sample dataset I have which including different technical indicators including moving average, Parabolic SAR etc for each stocks. From different online sources, most of them are predicting one stock with one feature like "Close" price a time. How can I make use all the stocks' features to predict one output let's say S&P's close price. I know it may not help boosting the prediction accuracy but I am not sure what I am training right now and hope having more insight on LSTM model.
Basically, I put the whole dataset in and do the scaling and training stuffs. How could the prediction being specified on one column?
Code:
scaler = MinMaxScaler(feature_range = (0,1))
scaled_feature_data = scaler.fit_transform(feature_data)
X_train, y_train = training_set[:, :-1], training_set[:, -1]
X_test, y_test = testing_set[:, :-1], testing_set[:, -1]
X_train = X_train.reshape((X_train.shape[0],1,X_train.shape[1]))
X_test = X_test.reshape((X_test.shape[0],1,X_test.shape[1]))
model_lstm.add(LSTM(50, return_sequences = True, input_shape = (X_train.shape[1], X_train.shape[2])))
Model:
model_lstm.add(LSTM(50, return_sequences = True, input_shape = (X_train.shape[1], X_train.shape[2])))
model_lstm.add(Dropout(0.2))
model_lstm.add(LSTM(units=50, return_sequences=True))
model_lstm.add(Dropout(0.2))
model_lstm.add(LSTM(units=50, return_sequences=True))
model_lstm.add(Dropout(0.2))
model_lstm.add(LSTM(units=50))
model_lstm.add(Dropout(0.2))
model_lstm.add(Dense(units=1, activation='relu'))

This is a regression problem. You are predicting one variable. So last layers may look like
model_lstm.add(Dense(1))
model_lstm.compile(loss='mse', optimizer='rmsprop')
Also, you probably don't need to return_sequences for each of input tokens. If return_sequences=true, then an output will be a matrix (actually (batch_size, num_tokens, num_features)), that cannot be automatically flattened to vector (batch_size, num_features) that is expected to be an input of Dense(1) layer. Just use an output of a last LSTM node. For this, set return_sequences=false. Its values depends on previous tokens, so you won't lose much information from them.
The whole model may look like this:
model_lstm = Sequential()
model_lstm.add(LSTM(50, return_sequences=False, input_shape=(X_train.shape[1], X_train.shape[2])))
model_lstm.add(Dropout(0.5))
model_lstm.add(Dense(1))
model_lstm.compile(loss='mse', optimizer='rmsprop')
If you want more layers it will become:
model_lstm = Sequential()
model_lstm.add(LSTM(64, return_sequences=True, dropout=0.5, input_shape=(X_train.shape[1], X_train.shape[-1])))
model_lstm.add(Dropout(0.5))
model_lstm.add(LSTM(32, return_sequences=False, dropout=0.5))
model_lstm.add(Dense(1))
model_lstm.compile(loss='mse', optimizer='rmsprop')

Actually I am wondering how can all features, lets say 10 technical indicator features, can help predict the one price column?
Don't know if i understand you correctly.But thats what a ML does, it tries to find corelations between the features and how they can be used to predict something. So maybe, the "DE30" has (or seems to have) a influence on the price and is therfore helpful. Was that your question?
rom different online sources, most of them are predicting one stock with one feature like "Close" price a time
I guess that for simplification. Therefore they used only one feature
Let my know if that was what you asked for..

Keras model with multiple outputs not converging

I am enjoying the simplicity that Keras offers, however I have not been successful in configuring a Keras regression model with multiple outputs.
More specifically, I have a Keras model that consumes X values with 308 columns and with 28 target Y values. The model is (I think) quite simple and I would have thought it would converge quite quickly, but in fact is does not.
I am guessing here, but I think I have setup the model incorrectly and am looking for assistance on how to configure a Keras model to work properly.
Data information:
Number of rows: 46038
My input shape: X_train: (46038, 308)
My target shape: Y_train: (46038, 28)
The inputs (X) are a series of floats representing values that influence the allocation of a resource. The targets are a series of floats (which total/sum to 1.0 representing the actual percent allocation to a particular resource). My goal is to predict resource pct allocations (Y) based upon the provided inputs (X) As such, I believe this is a regression problem and not a classification problem (correct me if I am wrong)
Sample data:
X: [100, 200, 400, 600, 32, 1, 0.1, 0.5, 2500...] (308 columns, with 40000+ rows)
Y: [0.333, 0.667, 0.0, 0.0, 0.0, ...]
In the case of Y above, this means that 0.333 (33%) of the resource is allocated to first resource, 0.667 (67%) is allocated to the second resource and 0.0 to all others)
Model:
model = Sequential()
model.add(Dense(256, input_shape=(308,) ))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(256, input_shape=(256,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(28))
model.compile(loss='mean_squared_error', optimizer='adam')
Here are a few specific questions:
1. Is my model configured properly to achieve my goals?
2. Should I have different activation functions?
3. Are my input shapes (308,) setup properly? Are my output shapes (28) correct?
4. Should I have an activation on my output layer (for example: model.add(Activation('softmax'))? if yes, what type would be ideal?
(I don't think it is particularly relevant, but I am using a Tensorflow backend)

model = Sequential()
model.add(Dense(256, input_shape=(308,) ))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(256, input_shape=(256,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(28, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Should solve the problem. Although it seems like a regression problem, the allocations are competing with each other which makes it like a classification and requires softmax nonlinearity and categorical_crossentropy loss.
Update
For early stopping you'll need a validation set and the following code:
earlyStopping=keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, verbose=0, mode='auto')
model.fit(X, y, batch_size=100, nb_epoch=100, verbose=1, callbacks=[earlyStopping], validation_split=0.0, validation_data=None, shuffle=True, show_accuracy=False, class_weight=None, sample_weight=None)
Also you'll need to define a new custom metric function which instead of accuracy returns cross-entropy loss. You set the metric argument in model.compile to this new function.

When and where should we use these keras LSTM models

I know how a RNN, LSTM, neural nets,activation function works but from various available LSTM models I dont know what should I use for which data and when. I created these 5 models as a sample of different varites of LSTM models I have seen but I dont know which optimal sequence dataset should use. I have most of my confussion in the second/third lines of these models. Are model1 and model4 are same? Why is model1.add(LSTM(10, input_shape=(max_len, 1), return_sequences=False)) different from model4.add(Embedding(X_train.shape[1], 128, input_length=max_len)) . I would much appreciate If some one can explain these five models in simple english.
from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
#model1
model1 = Sequential()
model1.add(LSTM(10, input_shape=(max_len, 1), return_sequences=False))
model1.add(Dense(1, activation='sigmoid'))
model1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print model1.summary()
#model2
model2 = Sequential()
model2.add(LSTM(10, batch_input_shape=(1, 1, 1), return_sequences=False, stateful=True))
model2.add(Dense(1, activation='sigmoid'))
model2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print model2.summary()
#model3
model3 = Sequential()
model3.add(TimeDistributed(Dense(X_train.shape[1]), input_shape=(X_train.shape[1],1)))
model3.add(LSTM(10, return_sequences=False))
model3.add(Dense(1, activation='sigmoid'))
model3.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print model3.summary()
#model4
model4 = Sequential()
model4.add(Embedding(X_train.shape[1], 128, input_length=max_len))
model4.add(LSTM(10))
model4.add(Dense(1, activation='sigmoid'))
model4.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print model4.summary()
#model5
model5 = Sequential()
model5.add(Embedding(X_train.shape[1], 128, input_length=max_len))
model5.add(Bidirectional(LSTM(10)))
model5.add(Dense(1, activation='sigmoid'))
model5.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print model5.summary()

So:
First network is the best one for classification. It's simply analysing the whole sequence - and once all input steps are fed to a model - it's able to perform a decision. There are other variants of this architecture (using e.g. GlobalAveragePooling1D or max one) which are pretty similiar from a conceptual point of view.
Second network - from a design point of view is quite similar to a first architecture. What differs them is the fact that in a first approach two consequent fit and predict calls are totally independent, whereas here - the starting state for second call is the same to the last one in a first. This enables a lot of cool scenarios like e.g. varying length sequences analysis or e.g. decision making processes thanks to the fact that you could effecitively stop inference / training process - affect network or input and come back to it with actualized state.
Is the best one when you don't want to use recurrent network at all stages of your computations. Especially - when your network is big - introducing a recurrent layers is quite costly from a parameter number point of view (introducing a recurrent connection usually increases the number of parameter by a factor of at least 2). So you could apply a static network as a preprocessing stage - and then you feed results to a recurrent part. This makes training easier.
Model is a special case of case 3. Here - you have a sequence of tokens which are coded by a one-hot encoding and then transformed using Embedding. This makes the process less memory consuming.
Bidrectional network provides you an advantage of knowing at each step not only a sequence previous history - but also further steps. This is at computational cost and also you are losing the possibilty of a sequential data feed - as you need to have a full sequence when analysis is performed.

Keras: How to feed input directly into other hidden layers of the neural net than the first?

I have a question about using Keras to which I'm rather new. I'm using a convolutional neural net that feeds its results into a standard perceptron layer, which generates my output. This CNN is fed with a series of images. This is so far quite normal.
Now I like to pass a short non-image input vector directly into the last perceptron layer without sending it through all the CNN layers. How can this be done in Keras?
My code looks like this:
# last CNN layer before perceptron layer
model.add(Convolution2D(200, 2, 2, border_mode='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Dropout(0.25))
# perceptron layer
model.add(Flatten())
# here I like to add to the input from the CNN an additional vector directly
model.add(Dense(1500, W_regularizer=l2(1e-3)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
Any answers are greatly appreciated, thanks!

You didn't show which kind of model you use to me, but I assume that you initialized your model as Sequential. In a Sequential model you can only stack one layer after another - so adding a "short-cut" connection is not possible.
For this reason authors of Keras added option of building "graph" models. In this case you can build a graph (DAG) of your computations. It's a more complicated than designing a stack of layers, but still quite easy.
Check the documentation site to look for more details.

Provided your Keras's backend is Theano, you can do the following:
import theano
import numpy as np
d = Dense(1500, W_regularizer=l2(1e-3), activation='relu') # I've joined activation and dense layers, based on assumption you might be interested in post-activation values
model.add(d)
model.add(Dropout(0.5))
model.add(Dense(1))
c = theano.function([d.get_input(train=False)], d.get_output(train=False))
layer_input_data = np.random.random((1,20000)).astype('float32') # refer to d.input_shape to get proper dimensions of layer's input, in my case it was (None, 20000)
o = c(layer_input_data)

The answer here works. It is more high level and works also for the tensorflow backend:
input_1 = Input(input_shape)
input_2 = Input(input_shape)
merge = merge([input_1, input_2], mode="concat") # could also to "sum", "dot", etc.
hidden = Dense(hidden_dims)(merge)
classify = Dense(output_dims, activation="softmax")(hidden)
model = Model(input=[input_1, input_2], output=hidden)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart