keras stuck during optimization - machine-learning
After trying the Keras example on CIFAR10, I decided to go for something bigger : a VGG-like net on the Tiny Imagenet dataset. This is a subset of the ImageNet dataset with 200 classes (instead of 1000) and 100K images downscaled to 64x64.
I got the VGG-like model from the file vgg_like_convnet.py here. Unfortunately, things are going pretty much like here except that this time changing the learning rate or swapping TH for TF does not help. Neither changing the optimizer (see code below).
Accuracy is basically stuck at 0.005 which, as it was pointed out, is what you would expected for completely random answer with 200 classes. Worse, if, by a fluke of weights init, it starts at, say, 0.007, it will quickly converges to 0.005 and firmly stays there for any subsequent epoch.
The Keras code (TH version) is below :
from __future__ import print_function
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.regularizers import l2, activity_l2, l1, activity_l1
from keras.optimizers import SGD, Adam, Adagrad, Adadelta
from keras.utils import np_utils
import numpy as np
import cPickle as pickle
# seed = 7
# np.random.seed(seed)
batch_size = 64
nb_classes = 200
nb_epoch = 30
# input image dimensions
img_rows, img_cols = 64, 64
# the tiny image net images are RGB
img_channels = 3
# Load the train dataset for TH
print('Load training data')
X_train=pickle.load(open('xtrain_shu_th.p','rb')) # np.zeros((100000,3,64,64)).astype('uint8')
y_train=pickle.load(open('ytrain_shu_th.p','rb')) # np.zeros((100000,1)).astype('uint8')
# Load the test dataset for TH
print('Load validation data')
X_test=pickle.load(open('xtest_th.p','rb')) # np.zeros((10000,3,64,64)).astype('uint8')
y_test=pickle.load(open('ytest_th.p','rb')) # np.zeros((10000,1)).astype('uint8')
# the data, shuffled and split between train and test sets
# (X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=(3,64,64)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, 3, 3, activation='relu',))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_6'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_8'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_11'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_13'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_15'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_18'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_20'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_22'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(Flatten())
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(200, activation='softmax'))
# let's train the model using SGD + momentum (how original).
opt = SGD(lr=0.0001, decay=1e-6, momentum=0.7, nesterov=True)
# opt= Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
# opt = Adadelta(lr=1.0, rho=0.95, epsilon=1e-08, decay=0.0)
# opt = Adagrad(lr=0.01, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('Optimization....')
model.fit(X_train, Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
validation_data=(X_test, Y_test),
shuffle=True)
# Save the resulting model
model.save('model.h5')
The Tiny Imagenet dataset consists of JPEG images that I converted to PPM with djpeg. I then created a large binary file containing, for each image, the class label (1 byte) followed by (64x64x3 bytes).
Reading this file from Keras was excruciatingly slow. So (I'm very new to Python, it might sound dumb to you), I decided to init a 4D Numpy array (100000,3,64,64) (for TH, (100000,64,64,3) for TF) with the dataset and pickle it. It now takes ~40s to load the dataset in the array when I run the code above.
I even checked that the pickled array contained the data in the right order with the code below:
import numpy as np
import cPickle as pickle
print("Reading data")
pix=pickle.load(open('xtrain_th.p','rb'))
print("Done")
img=67857
f=open('img'+str(img)+'.ppm','wb')
f.write('P6\n64 64\n255\n')
for y in range(0,64):
for x in range(0,64):
f.write(chr(pix[img][0][y][x]))
f.write(chr(pix[img][1][y][x]))
f.write(chr(pix[img][2][y][x]))
f.close()
This extracts PPM images back from the dataset.
Finally, I noticed that the training dataset was too ordered (i.e. the first 500 images all belonged to class 0, the second 500 to class 1, etc. etc.)
So I shuffled them with the code below:
# Dataset preparation for Theano backend
import cPickle as pickle
import numpy as np
import random as rnd
n=100000
print('Load training data')
X_train=pickle.load(open('xtrain_th.p','rb')) # np.zeros((100000,3,64,64)).astype('uint8')
y_train=pickle.load(open('ytrain_th.p','rb')) # np.zeros((100000,1)).astype('uint8')
tmpa=np.zeros((3,64,64)).astype('uint8')
# Shuffle the data
print('Shuffling training data')
for _ in range(0,n):
i=rnd.randrange(n)
j=rnd.randrange(n)
tmpa=X_train[i]
X_train[i]=X_train[j];
X_train[j]=tmpa
tmp=y_train[i][0]
y_train[i][0]=y_train[j][0]
y_train[j][0]=tmp
print 'Pickle dump'
pickle.dump(X_train,open('xtrain_shu_th.p','wb'))
pickle.dump(y_train,open('ytrain_shu_th.p','wb'))
Nothing helped. I wasn't expecting 99% accuracy at the first attempt, but at least some movement and then plateau.
I wanted to try TFLearn, but it had a pending bug when I looked a few days ago.
Any ideas ? Thanks in advance
You can use the build in shuffle of the keras model API (https://keras.io/models/model/#fit). Just set the shuffle parameter to true. You can do both batch shuffle and global shuffle. The default is global shuffle.
One thing to note though is that the validation split in fit is done before the shuffling takes place. Therefore in case you want to shuffle your validation data too I would advise you to use: sklearn.utils.shuffle. (http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html)
From github:
if shuffle == 'batch':
index_array = batch_shuffle(index_array, batch_size)
elif shuffle:
random.shuffle(index_array)
Related
View y_true of batch in Keras Callback during training
I am attempting to implement a custom loss functoin in Keras. It requires that I compute the sum of the inverse class frequencies for each y in B It is the 1/epsilon(...) portion of the below function The functoin is from this paper - Page 7 Note: I most definitely could be misinterpreting what the paper is describing to do. Please let me know if I am I am currently trying to use a Keras Callback and the on_batch_start/end methods to try and determine the class frequency of the input batch (which means accessing y_true of the batch input), but am having little luck. Thank you in advance for any help you can offer. Edit: By "little luck" I mean I cannot find a way to access the y_true of an individual batch during training. Example: batch_size = 64, train_features.shape == (50000, 120, 20), I cannot find a way to access the y_true of an individual batch during training. I can access the keras model from on_batch_start/end (self.model), but I cannot find a way to access the actual y_true of the batch, size 64. from tensorflow.python.keras.callbacks import Callback class FreqReWeight(Callback): """ Update learning rate by batch label frequency distribution -- for use with LDAM loss """ def __init__(self, C): self.C = C def on_train_begin(self, logs={}): self.model.custom_val = 0 def on_batch_end(self, batch, logs=None): print('batch index', batch) print('Model being trained', self.model) # how can one access the y_true of the batch? LDAM Loss Function zj = "the j-th output of the model for the j-th class" EDIT2 Loss Function - for testing when loss is called def LDAM(C): def loss(y_true, y_pred): print('shape', y_true.shape) # only prints each epoch, not each batch return K.mean(y_pred) + C # NOT LDAM, just dummy for testing purposes return loss Preparing Data, Compiling Model & Training (x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data() y_train = to_categorical(y_train, 10) y_test = to_categorical(y_test, 10) m = 64 # batch_size model = keras.Sequential() model.add(Conv2D(32, (3, 3), padding='same', input_shape=x_train.shape[1:])) model.add(Activation('relu')) model.add(Conv2D(32, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(512)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(10)) model.add(Activation('softmax')) model.compile(loss=LDAM(1), optimizer='sgd', metrics=['accuracy']) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255 model.fit(x_train, y_train, batch_size=m, validation_data=(x_test, y_test), callbacks=[FreqReWeight(1)])
Solution Ended up asking a more specific question regarding this. Answer to both can be found here
How to check the learning rate with train_on_batch [Keras]
I am using Keras on Python2. Does anyone know how to check and modify the learning rate for the ADAM optimizer please ? Here is my neural network and I defined my own optimizer. When training on batches with model.train_on_batch(...) I have no way to track the learning rate. Thanks for your help def CNN_model(): # Create model model = Sequential() model.add(Conv2D(12, (5, 5), input_shape=(1, 256, 256), activation='elu')) model.add(MaxPooling2D(pool_size=(3, 3))) model.add(Conv2D(12, (5, 5), activation='elu')) model.add(MaxPooling2D(pool_size=(4, 4))) model.add(Conv2D(12, (3, 3), activation='elu')) model.add(MaxPooling2D(pool_size=(3, 3))) model.add(Flatten()) model.add(Dropout(0.3)) model.add(Dense(128, activation='elu')) model.add(Dropout(0.3)) model.add(Dense(32, activation='elu')) model.add(Dense(2, activation='softmax')) # Compile model my_optimizer = Adam(lr=0.001, decay=0.05) model.compile(loss='categorical_crossentropy', optimizer=my_optimizer, metrics=['accuracy']) return model
You can do it in several ways. The simplest thing in my mind is to do it through callbacks from keras.callbacks import Callback from keras import backend as K class showLR( Callback ) : def on_epoch_begin(self, epoch, logs=None): lr = float(K.get_value(self.model.optimizer.lr)) print " epoch={:02d}, lr={:.5f}".format( epoch, lr )
You can use ReduceLROnPlateau callback. On your callbacks list add ReduceLROnPlateau callback and then just include your callback list to your train scheme. from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau callbacks= [ReduceLROnPlateau(monitor='val_acc', patience=5, verbose=1, factor=0.5, min_lr=0.00001)] model=CNN_model() model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_valid, y_valid), callbacks = callbacks)
Keras convolutional network scoring low on CIFAR-10 Dataset
I'm trying to train a CNN on the CIFAR-10 Dataset in Keras, but I'm only getting around 10% accuracy, essentially random. I'm training over 50 epochs, with a batch size of 32 and learning rate of 0.01. Is there anything in particular that I am doing wrong? import os import numpy as np import pandas as pd from PIL import Image from keras.models import Model from keras.layers import Input, Dense, Conv2D, MaxPool2D, Dropout, Flatten from keras.optimizers import SGD from keras.utils import np_utils # trainingData = np.array([np.array(Image.open("train/" + f)) for f in os.listdir("train")]) #shape: 50k, 32, 32, 3 # testingData = np.array([np.array(Image.open("test/" + f)) for f in os.listdir("test")]) #shape: same as training # # trainingLabels = np.array(pd.read_csv("trainLabels.csv"))[:,1] #categorical labels ["dog", "cat", "etc"....] # listOfLabels = sorted(list(set(trainingLabels))) # trainingOutput = np.array([np.array([1.0 if label == ind else 0.0 for ind in listOfLabels]) for label in trainingLabels]) #converted to output # #for example: training output for dog = # #[1.0, 0.0, 0.0, ...] # np.save("trainingInput.np", trainingData) # np.save("testingInput.np", testingData) # np.save("trainingOutput.np", trainingOutput) trainingInput = np.load("trainingInput.npy") #shape = 50k, 32, 32, 3 testingInput = np.load("testingInput.npy") #shape = 10k, 32, 32, 3 listOfLabels = sorted(list(set(np.array(pd.read_csv("trainLabels.csv"))[:,1]))) #categorical list of labels as strings trainingOutput = np.load("trainingOutput.npy") #shape = 50k, 10 #looks like [0.0, 1.0, 0.0 ... 0.0, 0.0] print(listOfLabels) print("Data loaded\n______________\n") inp = Input(shape=(32, 32, 3)) conva1 = Conv2D(64, (3, 3), padding='same', activation='relu')(inp) conva2 = Conv2D(64, (3, 3), padding='same', activation='relu')(conva1) poola = MaxPool2D(pool_size=(3, 3))(conva2) dropa = Dropout(0.1)(poola) convb1 = Conv2D(128, (5, 5), padding='same', activation='relu')(dropa) convb2 = Conv2D(128, (5, 5), padding='same', activation='relu')(convb1) poolb = MaxPool2D(pool_size=(3, 3))(convb2) dropb = Dropout(0.1)(poolb) flat = Flatten()(dropb) dropc = Dropout(0.5)(flat) out = Dense(len(listOfLabels), activation='softmax')(dropc) print(out.shape) model = Model(inputs=inp, outputs=out) lrSet = SGD(lr=0.01, clipvalue=0.5) model.compile(loss='categorical_crossentropy', optimizer=lrSet, metrics=['accuracy']) model.fit(trainingInput, trainingOutput, batch_size=32, epochs=50, verbose=1, validation_split=0.1) print(model.predict(testingInput))
Is there anything in particular that I am doing wrong? Not necessarily "wrong", but some pointers I can suggest are: It is important that you rescale your data, in case you are not doing so. Instead of handling values ranging from [0,255] it is better to divide all by 255 and handle data with ranges [0,1]. This helps your model's weights converge faster, as each gradient update will be more significant compared to it's unscaled version. I think that your dropout may be affecting your performance. Even more seeing that you are using CNNs and a strong (0.5) Dropout when passing data to your output. Quoting this great answer: In the original paper that proposed dropout layers, by Hinton (2012), dropout (with p=0.5) was used on each of the fully connected (dense) layers before the output; it was not used on the convolutional layers. This became the most commonly used configuration. More recent research has shown some value in applying dropout also to convolutional layers, although at much lower levels: p=0.1 or 0.2. So perhaps reducing your dropout or playing with it a bit will yield better results. Do notice that you are doing consecutive dropouts on your data, which doesn't seem quite helpful in my opinion and could also be causing problem, so consider redesigning that part: dropb = Dropout(0.1)(poolb) #drop flat = Flatten()(dropb) #flatten dropc = Dropout(0.5)(flat) #then drop again? Your learning rate may be higher than what is normally used. Although that is SGD's default learning rate, with higher learning values you may be "rushing" your training and failing to find better minima that could yield better performance. Consider using a lower learning rate (0.001 or lower, adjust epochs as needed), or well adding weight decay on your SGD instance. This will prevent your model from getting stuck on local minima that give sub-optimal results.
Why different intermediate layer ouput of CNN in keras?
I am using this code to perform some experiment, I want to use intermediate layer representation of layer mainly before the fully connected layer(or last layer) of CNN. from __future__ import print_function from keras.preprocessing import sequence from keras.models import Sequential from keras.layers import Dense, Dropout, Activation from keras.layers import Embedding from keras.layers import Conv1D, GlobalMaxPooling1D from keras.datasets import imdb # set parameters: max_features = 5000 maxlen = 400 batch_size = 100 embedding_dims = 50 filters = 250 kernel_size = 3 hidden_dims = 250 epochs = 100 print('Loading data...') (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features) print(len(x_train), 'train sequences') print(len(x_test), 'test sequences') print('Pad sequences (samples x time)') x_train = sequence.pad_sequences(x_train, maxlen=maxlen) x_test = sequence.pad_sequences(x_test, maxlen=maxlen) print('x_train shape:', x_train.shape) print('x_test shape:', x_test.shape) print('Build model...') model = Sequential() # we start off with an efficient embedding layer which maps # our vocab indices into embedding_dims dimensions model.add(Embedding(max_features, embedding_dims, input_length=maxlen)) model.add(Dropout(0.2)) # we add a Convolution1D, which will learn filters # word group filters of size filter_length: model.add(Conv1D(filters, kernel_size, padding='valid', activation='relu', strides=1)) # we use max pooling: model.add(GlobalMaxPooling1D()) # We add a vanilla hidden layer: model.add(Dense(hidden_dims)) model.add(Dropout(0.2)) model.add(Activation('relu'))#<======== I need output after this. # We project onto a single unit output layer, and squash it with a sigmoid: model.add(Dense(1)) model.add(Activation('sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) To get the intermediate layer representation of penultimate layer I used following code. CODE1 get_layer_output = K.function([model.layers[0].input, K.learning_phase()], [model.layers[6].output]) # output in test mode = 0 layer_output_test = get_layer_output([x_test, 0])[0] # output in train mode = 1 layer_output_train = get_layer_output([x_train, 1])[0] print(layer_output_train) print(layer_output_train.shape) CODE2 def get_activations(model, layer, X_batch): get_activations = K.function([model.layers[0].input, K.learning_phase()], [model.layers[layer].output,]) activations = get_activations([X_batch,1]) return activations import numpy as np X_train=np.array(get_activations(model=model,layer=6, X_batch=x_train)[0], dtype=np.float32) print(X_train) print(X_train.shape) Which one is correct as I am getting/printing different output for above two codes? I want to use the above correct output to multiply by weights and optimise by custom optimiser.
If you pass 1 to K.learning_phase() you will get different results every time. But both codes give the same result.
Using a higher level approach, you can do this: from keras.models import Model newModel = Model(model.inputs,model.layers[6].output) Do whatever you want with newModel. You can train it (and affect the original model), and use it to predict values.
Keras LSTM input features and incorrect dimensional data input
So I'm trying to practice how to use LSTMs in Keras and all parameter (samples, timesteps, features). 3D list is confusing me. So I have some stock data and if the next item in the list is above the threshold of 5 which is +-2.50 it buys OR sells, if it is in the middle of that threshold it holds, these are my labels: my Y. For my features my X I have a dataframe of [500, 1, 3] for my 500 samples and each timestep is 1 since each data is 1 hour increment and 3 for 3 features. But I get this error: ValueError: Error when checking model input: expected lstm_1_input to have 3 dimensions, but got array with shape (500, 3) How can I fix this code and what am I doing wrong? import json import pandas as pd from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM """ Sample of JSON file {"time":"2017-01-02T01:56:14.000Z","usd":8.14}, {"time":"2017-01-02T02:56:14.000Z","usd":8.16}, {"time":"2017-01-02T03:56:15.000Z","usd":8.14}, {"time":"2017-01-02T04:56:16.000Z","usd":8.15} """ file = open("E.json", "r", encoding="utf8") file = json.load(file) """ If the price jump of the next item is > or < +-2.50 the append 'Buy or 'Sell' If its in the range of +- 2.50 then append 'Hold' This si my classifier labels """ data = [] for row in range(len(file['data'])): row2 = row + 1 if row2 == len(file['data']): break else: difference = file['data'][row]['usd'] - file['data'][row2]['usd'] if difference > 2.50: data.append((file['data'][row]['usd'], 'SELL')) elif difference < -2.50: data.append((file['data'][row]['usd'], 'BUY')) else: data.append((file['data'][row]['usd'], 'HOLD')) """ add the price the time step which si 1 and the features which is 3 """ frame = pd.DataFrame(data) features = pd.DataFrame() # train LSTM for x in range(500): series = pd.Series(data=[500, 1, frame.iloc[x][0]]) features = features.append(series, ignore_index=True) labels = frame.iloc[16000:16500][1] # test #yt = frame.iloc[16500:16512][0] #xt = pd.get_dummies(frame.iloc[16500:16512][1]) # create LSTM model = Sequential() model.add(LSTM(3, input_shape=features.shape, activation='relu', return_sequences=False)) model.add(Dense(2, activation='relu')) model.add(Dense(1, activation='relu')) model.compile(loss='mse', optimizer='adam', metrics=['accuracy']) model.fit(x=features.as_matrix(), y=labels.as_matrix()) """ ERROR Anaconda3\envs\Final\python.exe C:/Users/Def/PycharmProjects/Ether/Main.py Using Theano backend. Traceback (most recent call last): File "C:/Users/Def/PycharmProjects/Ether/Main.py", line 62, in <module> model.fit(x=features.as_matrix(), y=labels.as_matrix()) File "\Anaconda3\envs\Final\lib\site-packages\keras\models.py", line 845, in fit initial_epoch=initial_epoch) File "\Anaconda3\envs\Final\lib\site-packages\keras\engine\training.py", line 1405, in fit batch_size=batch_size) File "\Anaconda3\envs\Final\lib\site-packages\keras\engine\training.py", line 1295, in _standardize_user_data exception_prefix='model input') File "\Anaconda3\envs\Final\lib\site-packages\keras\engine\training.py", line 121, in _standardize_input_data str(array.shape)) ValueError: Error when checking model input: expected lstm_1_input to have 3 dimensions, but got array with shape (500, 3) """ Thanks.
This is my first post here I wish that could be useful I will try to do my best First you need to create 3 dimension array to work with input_shape in keras you can watch this in keras documentation or in a better way: from keras.models import Sequential Sequential? Linear stack of layers. Arguments layers: list of layers to add to the model. # Note The first layer passed to a Sequential model should have a defined input shape. What that means is that it should have received an input_shape or batch_input_shape argument, or for some type of layers (recurrent, Dense...) an input_dim argument. Example ```python model = Sequential() # first layer must have a defined input shape model.add(Dense(32, input_dim=500)) # afterwards, Keras does automatic shape inference model.add(Dense(32)) # also possible (equivalent to the above): model = Sequential() model.add(Dense(32, input_shape=(500,))) model.add(Dense(32)) # also possible (equivalent to the above): model = Sequential() # here the batch dimension is None, # which means any batch size will be accepted by the model. model.add(Dense(32, batch_input_shape=(None, 500))) model.add(Dense(32)) After that how to transform arrays 2 dimensions in 3 dimmension check np.newaxis Useful commands that help you more than you expect: Sequential?, -Sequential??, -print(list(dir(Sequential))) Best