Trying to understand Pytorch's implementation of LSTM - machine-learning

I have a dataset containing 1000 examples where each example has 5 features (a,b,c,d,e). I want to feed 7 examples to an LSTM so it predicts the feature (a) of the 8th day.
Reading Pytorchs documentation of nn.LSTM() I came up with the following:
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
output = fc(out[-1])
output # output's shape is ([7,1])
According to the docs:
The input of the nn.LSTM is "input of shape (seq_len, batch, input_size)" with "input_size – The number of expected features in the input x",
And the output is: "output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t."
In this case, I thought seq_len would be the sequence of 7 examples, batchis 1 and input_size is 5. So the lstm would consume each example containing 5 features refeeding the hidden layer every iteration.
What am I missing?

When I extend your code to a full example -- I also added some comments to may help -- I get the following:
import torch
import torch.nn as nn
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
X = [
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
]
X = torch.tensor(X, dtype=torch.float32)
print(X.shape) # (seq_len, batch_size, input_size) = (7, 1, 5)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
print(out.shape) # (seq_len, batch_size, hidden_size) = (7, 1, 10)
out = out[-1] # Get output of last step
print(out.shape) # (batch, hidden_size) = (1, 10)
out = fc(out) # Push through linear layer
print(out.shape) # (batch_size, output_size) = (1, 1)
This makes sense to me, given your batch_size = 1 and output_size = 1 (I assume, you're doing regression). I don't know where your output.shape = (7, 1) come from.
Are you sure that your X has the correct dimensions? Did you create nn.LSTM maybe with batch_first=True? There are lot of little things that can sneak in.

Related

Understanding the output of LSTM predictions

It's a 15-class classification model, OUTPUT_DIM = 15. I'm trying to input a frequency vector like this one 'hi my name is' => [1,43,2,56].
When I call predictions = model(x_train[0]) I get an array of size torch.Size([100, 15]), instead of just a 1D array with 15 classes like this: torch.Size([15]). What's happening? Why is this the output? How can I fix it? Thank you in advance, more info below.
The model (from main docs) is the following:
import torch.nn as nn
class RNN(nn.Module):
def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
super().__init__()
self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
self.lstm = nn.LSTM(embedding_dim, hidden_dim)
self.hidden2tag = nn.Linear(hidden_dim, output_dim)
def forward(self, text):
embeds = self.word_embeddings(text)
lstm_out, _ = self.lstm(embeds.view(len(text), 1, -1))
tag_space = self.hidden2tag(lstm_out.view(len(text), -1))
tag_scores = F.log_softmax(tag_space, dim=1)
return tag_scores
Parameters:
INPUT_DIM = 62288
EMBEDDING_DIM = 64
HIDDEN_DIM = 128
OUTPUT_DIM = 15
The LSTM function in Pytorch returns not just the output of the last timestep but all outputs instead (this is useful in some cases). So in your example you seem to have exactly 100 timesteps (the amount of timesteps is just your sequence length).
But since you are doing classification you just care about the last output. You can normally get it like this:
outputs, _ = self.lstm(embeddings)
# shape: batch_size x 100 x 15
output = outputs[:, -1]
# shape: batch_size x 1 x 15

Pytorch Multiclass Logistic Regression Type Errors

I'm new to ML and even more naive with Pytorch. Here's the problem. (I've skipped certain parts like the random_split() which seem to work just fine)
I've to predict wine quality (red) which from the dataset is the last column with 6 classes
That's what my dataset looks like
The link to the dataset (winequality-red.csv)
features = df.drop(['quality'], axis = 1)
targets = df.iloc[:, -1] # theres 6 classes
dataset = TensorDataset(torch.Tensor(np.array(features)).float(), torch.Tensor(targets).float())
# here's where I think the error might be, but I might be wrong
batch_size = 8
# Dataloader
train_loader = DataLoader(train_ds, batch_size, shuffle = True)
val_loader = DataLoader(val_ds, batch_size)
test_ds = DataLoader(test_ds, batch_size)
input_size = len(df.columns) - 1
output_size = 6
threshold = .5
class WineModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(input_size, output_size)
def forward(self, xb):
out = self.linear(xb)
return out
model = WineModel()
n_iters = 2000
num_epochs = n_iters / (len(train_ds) / batch_size)
num_epochs = int(num_epochs)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
# the part below returns the error on running
iter = 0
for epoch in range(num_epochs):
for i, (x, y) in enumerate(train_loader):
optimizer.zero_grad()
outputs = model(x)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
RuntimeError: expected scalar type Long but found Float
Hopefully that is sufficient info
The targets for nn.CrossEntropyLoss are given as the class indices, which are required to be integers, to be precise they need to be of type torch.long, which is equivalent to torch.int64.
You converted the targets to floats, but you should convert them to longs:
dataset = TensorDataset(torch.Tensor(np.array(features)).float(), torch.Tensor(targets).long())
Since the targets are the indices of the classes, they must be in range [0, num_classes - 1]. As you have 6 classes that would be in range [0, 5]. Having a quick look at your data, the quality uses values in range [3, 8]. Even though you have 6 classes, the values cannot be used directly as the classes. If you list the classes as classes = [3, 4, 5, 6, 7, 8], you can see that the first class is 3, classes[0] == 3, up to the last class being classes[5] == 8.
You need to replace the class values with the indices, just like you would for named classes (e.g. if you had the classes dog and cat, dog would be 0 and cat would be 1), but you can avoid having to look them up, since the values are simply shifted by 3, i.e. index = classes[index] - 3. Therefore you can subtract 3 from the entire target tensor:
torch.Tensor(targets).long() - 3

Creating and shaping data for 1D CNN

I have a (244, 108) numpy array. It contains percentage change of close value of a trade for each minute in one day ie 108 values and like that for 244 days. Basically its a 1D vector. So in order to do 1D CNN how should I shape my data?
What i have done:
x.shape = (244, 108)
x = np.expand_dims(x, axis=2)
x.shape = (243, 108, 1)
y.shape = (243,)
Model:
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.layer1 = torch.nn.Conv1d(in_channels=108, out_channels=1, kernel_size=1, stride=1)
self.act1 = torch.nn.ReLU()
self.act2 = torch.nn.MaxPool1d(kernel_size=1, stride=1)
self.layer2 = torch.nn.Conv1d(in_channels=1, out_channels=1, kernel_size=1, stride=1)
self.act3 = torch.nn.ReLU()
self.act4 = torch.nn.MaxPool1d(kernel_size=1, stride=1)
self.linear_layers = nn.Linear(1, 1)
# Defining the forward pass
def forward(self, x):
x = self.layer1(x)
x = self.act1(x)
x = self.act2(x)
x = self.layer2(x)
x = self.act3(x)
x = self.act4(x)
x = self.linear_layers(x)
return x
If each day should be separate instance for convolution your data should have the shape (248, 1, 108). This seems more reasonable.
If you want all your days and minutes to be a continuum for network to learn it should be of shape (1, 1, 248*108).
Basically first dimension is batch (how many training samples), second is the number of channels or features of sample (only one in your case) and last is the number of timesteps.
Edit
Your pooling layer should be torch.nn.AdaptiveMaxPool1d(1). You should also reshape output from this layer like this: pooled.reshape(x.shape[0], -1) before pushing it through torch.nn.Linear layer.

Tensorflow dynamic_rnn TypeError: 'Tensor' object is not iterable

I'm trying to get a basic LSTM working in TensorFlow. I'm receiving the following error:
TypeError: 'Tensor' object is not iterable.
The offending line is:
rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, x, sequence_length=seqlen,
initial_state=init_state,)`
I'm using version 1.0.1 on windows 7. My inputs and label have the following shapes
x_shape = (50, 40, 18), y_shape = (50, 40)
Where:
batch size = 50
sequence length = 40
input vector length at each step = 18
I'm building my graph as follows
def build_graph(learn_rate, seq_len, state_size=32, batch_size=5):
# use a fixed sequence length
seqlen = tf.constant(seq_len, shape=[batch_size],dtype=tf.int32)
# Placeholders
x = tf.placeholder(tf.float32, [batch_size, None, 18])
y = tf.placeholder(tf.float32, [batch_size, None])
keep_prob = tf.constant(1.0)
# RNN
cell = tf.contrib.rnn.LSTMCell(state_size)
init_state = tf.get_variable('init_state', [1, state_size],
initializer=tf.constant_initializer(0.0))
init_state = tf.tile(init_state, [batch_size, 1])
rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, x, sequence_length=seqlen,
initial_state=init_state,)
# Add dropout, as the model otherwise quickly overfits
rnn_outputs = tf.nn.dropout(rnn_outputs, keep_prob)
# Prediction layer
with tf.variable_scope('prediction'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
preds = tf.tanh(tf.matmul(rnn_outputs, W) + b)
# MSE
loss = tf.square(tf.subtract(y, preds))
# loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits, y))
train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss)
Can anyone tell me what I am missing?
Sequence length should be iterable e.g. a list or tensor, not a scalar. In your case specifically, you need to replace sequence length = 40 with a list of the lengths of each input. For instance, if your first sequence has 10 steps, the second 13 and the third 18, you would pass in [10, 13, 18]. This lets TensorFlow's dynamic RNN know how many steps to unroll for (I believe it uses a while loop internally).

How to interpret predictions in TensorFlow, they seem to have the wrong shape

I have a TensorFlow model with some of these characteristics:
state_size = 800,
num_classes = 14313,
batch_size = 10,
num_steps = 16, # width of the tensor
num_layers = 3
x = tf.placeholder(tf.int32, [batch_size, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [batch_size, num_steps], name='labels_placeholder')
rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in
tf.split(x_one_hot, num_steps, 1)] # still a list of tensors (batch_size, num_classes)
...
logits = tf.matmul(rnn_outputs, W) + b
predictions = tf.nn.softmax(logits)
Now I want to feed it a np.array (shape = batch_size x num_steps, so 10 x 16) and I get a predictions tensor back.
Weirdly, its shape is 160 x 14313. The latter is the number of classes. But where does 160 come from? I don't understand that. I would like to have a probability for each of my classes, for each of the elements of the batch (which is 10). How did the num_steps become involved, how to I read from this pred. tensor which is the expected element after those 16 numbers?
In this case the 160 comes from the shape as you suspected.
which means that for each batch of 10, has 16 timesteps, this is technically flattened when you do your shape variable.
At this point you have logits of shape 160 * classes. so you can do predictions[i] for each batch which then will have the probability of each class being the desired class.
which is why to get the chosen class you would do something like tf.argmax(predictions, 1) to get a tensor with the classification
this will have a shape of 160 in your case, so it will be the predicted
class for each one of the batches.
In order to get the probabilities, you could use the logits:
def prob(logit):
return 1/(1 + np.exp(-logit)

Resources