Can someone give a full working code (not a snippet, but something that runs on a variable-length recurrent neural network) on how would you use the PackedSequence method in PyTorch?
There do not seem to be any examples of this in the documentation, github, or the internet.
https://github.com/pytorch/pytorch/releases/tag/v0.1.10
Not the most beautiful piece of code, but this is what I gathered for my personal use after going through PyTorch forums and docs. There can be certainly better ways to handle the sorting - restoring part, but I chose it to be in the network itself
EDIT: See answer from #tusonggao which makes torch utils take care of sorting parts
class Encoder(nn.Module):
def __init__(self, vocab_size, embedding_size, embedding_vectors=None, tune_embeddings=True, use_gru=True,
hidden_size=128, num_layers=1, bidrectional=True, dropout=0.6):
super(Encoder, self).__init__()
self.embed = nn.Embedding(vocab_size, embedding_size, padding_idx=0)
self.embed.weight.requires_grad = tune_embeddings
if embedding_vectors is not None:
assert embedding_vectors.shape[0] == vocab_size and embedding_vectors.shape[1] == embedding_size
self.embed.weight = nn.Parameter(torch.FloatTensor(embedding_vectors))
cell = nn.GRU if use_gru else nn.LSTM
self.rnn = cell(input_size=embedding_size, hidden_size=hidden_size, num_layers=num_layers,
batch_first=True, bidirectional=True, dropout=dropout)
def forward(self, x, x_lengths):
sorted_seq_lens, original_ordering = torch.sort(torch.LongTensor(x_lengths), dim=0, descending=True)
ex = self.embed(x[original_ordering])
pack = torch.nn.utils.rnn.pack_padded_sequence(ex, sorted_seq_lens.tolist(), batch_first=True)
out, _ = self.rnn(pack)
unpacked, unpacked_len = torch.nn.utils.rnn.pad_packed_sequence(out, batch_first=True)
indices = Variable(torch.LongTensor(np.array(unpacked_len) - 1).view(-1, 1)
.expand(unpacked.size(0), unpacked.size(2))
.unsqueeze(1))
last_encoded_states = unpacked.gather(dim=1, index=indices).squeeze(dim=1)
scatter_indices = Variable(original_ordering.view(-1, 1).expand_as(last_encoded_states))
encoded_reordered = last_encoded_states.clone().scatter_(dim=0, index=scatter_indices, src=last_encoded_states)
return encoded_reordered
Actually there is no need to mind the sorting - restoring problem yourself, let the torch.nn.utils.rnn.pack_padded_sequence function do all the work, by setting the parameter enforce_sorted=False.
Then the returned PackedSequence object will carry the sorting related info in its sorted_indices and unsorted_indicies attributes, which can be used properly by the followed nn.GRU or nn.LSTM to restore the original index order.
Runnable code example:
import torch
from torch import nn
from torch.nn.utils.rnn import pad_sequence, pack_padded_sequence, pad_packed_sequence
data = [torch.tensor([1]),
torch.tensor([2, 3, 4, 5]),
torch.tensor([6, 7]),
torch.tensor([8, 9, 10])]
lengths = [d.size(0) for d in data]
padded_data = pad_sequence(data, batch_first=True, padding_value=0)
embedding = nn.Embedding(20, 5, padding_idx=0)
embeded_data = embedding(padded_data)
packed_data = pack_padded_sequence(embeded_data, lengths, batch_first=True, enforce_sorted=False)
lstm = nn.LSTM(5, 5, batch_first=True)
o, (h, c) = lstm(packed_data)
# (h, c) is the needed final hidden and cell state, with index already restored correctly by LSTM.
# but o is a PackedSequence object, to restore to the original index:
unpacked_o, unpacked_lengths = pad_packed_sequence(o, batch_first=True)
# now unpacked_o, (h, c) is just like the normal output you expected from a lstm layer.
print(unpacked_o, unpacked_lengths)
We get the output of unpacked_o, unpacked_lengths something like follows:
# output (unpacked_o, unpacked_lengths):
tensor([[[ 1.5230, -1.7530, 0.5462, 0.6078, 0.9440],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]],
[[ 1.8888, -0.5465, 0.5404, 0.4132, -0.3266],
[ 0.1657, 0.5875, 0.4556, -0.8858, 1.1443],
[ 0.8957, 0.8676, -0.6614, 0.6751, -1.2377],
[-1.8999, 2.8260, 0.1650, -0.6244, 1.0599]],
[[ 0.0637, 0.3936, -0.4396, -0.2788, 0.1282],
[ 0.5443, 0.7401, 1.0287, -0.1538, -0.2202],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]],
[[-0.5008, 2.1262, -0.3623, 0.5864, 0.9871],
[-0.6996, -0.3984, 0.4890, -0.8122, -1.0739],
[ 0.3392, 1.1305, -0.6669, 0.5054, -1.7222],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]],
grad_fn=<IndexSelectBackward>) tensor([1, 4, 2, 3])
Comparing it with the original data and lengths, we can find the sorting - restoring problem has been neatly taken care of.
Related
I want to use GradCAM activation to infer the obeject location on the image, it is not really a problem if there is only one contour heatmap since I can simply use argmax to get that. But I want to be able to grab more than one heatmap hoping that with more than one heatmap we can point the location more accurately.
Here is the example
import matplotlib.pyplot as plt
activation_map = [[0.0724, 0.0615, 0.0607, 0.0710, 0.0000, 0.0000, 0.0154],
[0.1111, 0.0835, 0.0923, 0.0409, 0.0000, 0.0000, 0.0000],
[0.0986, 0.0860, 0.1138, 0.0706, 0.0144, 0.0000, 0.0000],
[0.1134, 0.1109, 0.2244, 0.3414, 0.2652, 0.2708, 0.1664],
[0.1165, 0.1620, 0.5605, 0.7064, 0.4593, 0.6628, 0.6103],
[0.0852, 0.2324, 1.0000, 0.8605, 0.5095, 0.8457, 0.8332],
[0.0349, 0.2422, 0.9287, 0.5717, 0.2054, 0.4749, 0.6983]]
plt.imshow(activation_map)
After I rescale this activation map to proper scaling it looks something like this
import cv2
activation_map_resized = cv2.resize(np.array(activation_map), (64, 64))
plt.imshow(activation_map_resized)
I want to be able to get the point around (22, 50) and (55, 50).
Any method I can use to find the center of that two contour heatmap efficiently? I can imagine using gradient or some clustering, but I'm not sure if it is the most effiient method to use.
What you are looking for is called peak detection, in your case you'll have to threshold the data first as the gradient has some "low" peaks in it that you would like to ignore
import numpy as np
from scipy.ndimage.filters import maximum_filter
from scipy.ndimage.morphology import generate_binary_structure, binary_erosion
import matplotlib.pyplot as pp
def detect_peaks(image):
neighborhood = generate_binary_structure(2,2)
local_max = maximum_filter(image, footprint=neighborhood)==image
background = (image==0)
eroded_background = binary_erosion(background, structure=neighborhood, border_value=1)
detected_peaks = local_max ^ eroded_background
return detected_peaks
activation_map = np.array([[0.0724, 0.0615, 0.0607, 0.0710, 0.0000, 0.0000, 0.0154],
[0.1111, 0.0835, 0.0923, 0.0409, 0.0000, 0.0000, 0.0000],
[0.0986, 0.0860, 0.1138, 0.0706, 0.0144, 0.0000, 0.0000],
[0.1134, 0.1109, 0.2244, 0.3414, 0.2652, 0.2708, 0.1664],
[0.1165, 0.1620, 0.5605, 0.7064, 0.4593, 0.6628, 0.6103],
[0.0852, 0.2324, 1.0000, 0.8605, 0.5095, 0.8457, 0.8332],
[0.0349, 0.2422, 0.9287, 0.5717, 0.2054, 0.4749, 0.6983]])
#Thresholding - remove this line to expose "low" peaks
activation_map[activation_map<0.3] = 0
detected_peaks = detect_peaks(activation_map)
pp.subplot(4,2,1)
pp.imshow(activation_map)
pp.subplot(4,2,2)
pp.imshow(detected_peaks)
pp.show()
Reference/more detailed info about peak detection:
Peak detection in a 2D array
Putting together different base and documentation examples, I have managed to come up with this:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
def objective(config, reporter):
for i in range(config['iterations']):
model = RandomForestClassifier(random_state=0, n_jobs=-1, max_depth=None, n_estimators= int(config['n_estimators']), min_samples_split=int(config['min_samples_split']), min_samples_leaf=int(config['min_samples_leaf']))
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Feed the score back to tune?
reporter(precision=precision_score(y_test, y_pred, average='macro'))
space = {'n_estimators': (100,200),
'min_samples_split': (2, 10),
'min_samples_leaf': (1, 5)}
algo = BayesOptSearch(
space,
metric="precision",
mode="max",
utility_kwargs={
"kind": "ucb",
"kappa": 2.5,
"xi": 0.0
},
verbose=3
)
scheduler = AsyncHyperBandScheduler(metric="precision", mode="max")
config = {
"num_samples": 1000,
"config": {
"iterations": 10,
}
}
results = run(objective,
name="my_exp",
search_alg=algo,
scheduler=scheduler,
stop={"training_iteration": 400, "precision": 0.80},
resources_per_trial={"cpu":2, "gpu":0.5},
**config)
print(results.dataframe())
print("Best config: ", results.get_best_config(metric="precision"))
It runs and I am able to get a best configuration at the end of everything. However, my doubt mainly lies in the objective function. Do I have that properly written? There are no samples that I could find
Follow up question:
What is num_samples in the config object? Is it the number of samples it will extract from the overall training data for each trial?
Tune now has native sklearn bindings: https://github.com/ray-project/tune-sklearn
Can you give that a shot instead?
To answer your original question, the objective function looks good; and num_samples is the total number of hyperparameter configurations you want to try.
Also, you'll want to remove the forloop from your training function:
def objective(config, reporter):
model = RandomForestClassifier(random_state=0, n_jobs=-1, max_depth=None, n_estimators= int(config['n_estimators']), min_samples_split=int(config['min_samples_split']), min_samples_leaf=int(config['min_samples_leaf']))
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Feed the score back to tune
reporter(precision=precision_score(y_test, y_pred, average='macro'))
Im working on a speaker recognition Neural Network.
What I am doing is taking wav files [ of the Bing Bang Theory first espiode :-) ], than convert it to MFCC coeffs than I make it as an input to an open source api of Neural Network (MLPClassifier) and as output I define a unique vector to each speaker ( Let's say : [1,0,0,0] - sheldon; [0,1,0,0] - Penny; and ect... ), I take 50 random values for testing and the others for fitting ( training )
This is my code, At the begining I got about random accucary for the NN but after some help of amazing guy I improved it to ~42% but I want more :) about 70% :
from sklearn.neural_network import MLPClassifier
import python_speech_features
import scipy.io.wavfile as wav
import numpy as np
from os import listdir
from os.path import isfile, join
from random import shuffle
import matplotlib.pyplot as plt
from tqdm import tqdm
from random import randint
import random
winner = [] # this array count how much Bingo we had when we test the NN
random_winner = []
win_len = 0.04 # in seconds
step = win_len / 2
nfft = 2048
for TestNum in tqdm(range(20)): # in every round we build NN with X,Y that out of them we check 50 after we build the NN
X = []
Y = []
onlyfiles = [f for f in listdir("FinalAudios/") if isfile(join("FinalAudios/", f))] # Files in dir
names = [] # names of the speakers
for file in onlyfiles: # for each wav sound
# UNESSECERY TO UNDERSTAND THE CODE
if " " not in file.split("_")[0]:
names.append(file.split("_")[0])
else:
names.append(file.split("_")[0].split(" ")[0])
only_speakers = [] + names
#print only_speakers
names = list(dict.fromkeys(names)) # names of speakers
print names
vector_names = [] # vector for each name
i = 0
vector_for_each_name = [0] * len(names)
for name in names:
vector_for_each_name[i] += 1
vector_names.append(np.array(vector_for_each_name))
vector_for_each_name[i] -= 1
i += 1
for f in onlyfiles:
if " " not in f.split("_")[0]:
f_speaker = f.split("_")[0]
else:
f_speaker = f.split("_")[0].split(" ")[0]
fs, audio = wav.read("FinalAudios/" + f) # read the file
try:
mfcc_feat = python_speech_features.mfcc(audio, samplerate=fs, winlen=win_len,
winstep=step, nfft=nfft, appendEnergy=False)
flat_list = [item for sublist in mfcc_feat for item in sublist]
X.append(np.array(flat_list))
Y.append(np.array(vector_names[names.index(f_speaker)]))
except IndexError:
pass
Z = list(zip(X, Y))
shuffle(Z) # WE SHUFFLE X,Y TO PERFORM RANDOM ON THE TEST LEVEL
X, Y = zip(*Z)
X = list(X)
Y = list(Y)
X = np.asarray(X)
Y = np.asarray(Y)
Y_test = Y[:50] # CHOOSE 50 FOR TEST, OTHERS FOR TRAIN
X_test = X[:50]
X = X[50:]
Y = Y[50:]
print len(X)
clf = MLPClassifier(solver='lbfgs', alpha=3e-2, hidden_layer_sizes=(50, 20), random_state=2) # create the NN
clf.fit(X, Y) # Train it
print list(clf.predict_proba([X[0]])[0])
print list(Y_test[0])
for sample in range(len(X_test)): # add 1 to winner array if we correct and 0 if not, than in the end it plot it
arr = list(clf.predict([X_test[sample]])[0])
if arr.index(max(arr)) == list(Y_test[sample]).index(1):
winner.append(1)
else:
winner.append(0)
if only_speakers[randint(0, len(only_speakers) - 1)] == only_speakers[randint(0, len(only_speakers) - 1)]:
random_winner.append(1)
else:
random_winner.append(0)
# plot winner
plot_x = []
plot_y = []
for i in range(1, len(winner)):
plot_y.append(sum(winner[0:i])*1.0/len(winner[0:i]))
plot_x.append(i)
plot_random_x = []
plot_random_y = []
for i in range(1, len(random_winner)):
plot_random_y.append(sum(random_winner[0:i])*1.0/len(random_winner[0:i]))
plot_random_x.append(i)
plt.plot(plot_x, plot_y, 'r', label='machine learning')
plt.plot(plot_random_x, plot_random_y, 'b', label='random')
plt.xlabel('Number Of Samples')
# naming the y axis
plt.ylabel('Success Rate')
# giving a title to my graph
plt.title('Success Rate : Random Vs ML!')
# function to show the plot
plt.show()
This is my zip file that contains the code and the audio file : https://ufile.io/eggjm1gw
Somebody have an idea how can I improve my accucary?
Edit :
I improved my data set and put convolution model and got 60% accucarry, which is ok but also not good enoguh
import python_speech_features
import scipy.io.wavfile as wav
import numpy as np
from os import listdir
import os
import shutil
from os.path import isfile, join
from random import shuffle
from matplotlib import pyplot
from tqdm import tqdm
from random import randint
import tensorflow as tf
from ast import literal_eval as str2arr
from tempfile import TemporaryFile
#win_len = 0.04 # in seconds
#step = win_len / 2
#nfft = 2048
win_len = 0.05 # in seconds
step = win_len
nfft = 16384
results = []
outfile_x = None
outfile_y = None
winner = []
for TestNum in tqdm(range(40)): # We check it several times
if not outfile_x: # if path not exist we create it
X = [] # inputs
Y = [] # outputs
onlyfiles = [f for f in listdir("FinalAudios") if isfile(join("FinalAudios", f))] # Files in dir
names = [] # names of the speakers
for file in onlyfiles: # for each wav sound
# UNESSECERY TO UNDERSTAND THE CODE
if " " not in file.split("_")[0]:
names.append(file.split("_")[0])
else:
names.append(file.split("_")[0].split(" ")[0])
only_speakers = [] + names
namesWithoutDuplicate = list(dict.fromkeys(names))
namesWithoutDuplicateCopy = namesWithoutDuplicate[:]
for name in namesWithoutDuplicateCopy: # we remove low samples files
if names.count(name) < 107:
namesWithoutDuplicate.remove(name)
names = namesWithoutDuplicate
print(names) # print it
vector_names = [] # output for each name
i = 0
for name in names:
vector_for_each_name = i
vector_names.append(np.array(vector_for_each_name))
i += 1
for f in onlyfiles: # for all the files
if " " not in f.split("_")[0]:
f_speaker = f.split("_")[0]
else:
f_speaker = f.split("_")[0].split(" ")[0]
if f_speaker in namesWithoutDuplicate:
fs, audio = wav.read("FinalAudios\\" + f) # read the file
try:
# compute MFCC
mfcc_feat = python_speech_features.mfcc(audio, samplerate=fs, winlen=win_len, winstep=step, nfft=nfft, appendEnergy=False)
#flat_list = [item for sublist in mfcc_feat for item in sublist]
# Create output + inputs
for i in mfcc_feat:
X.append(np.array(i))
Y.append(np.array(vector_names[names.index(f_speaker)]))
except IndexError:
pass
else:
if not os.path.exists("TooLowSamples"): # if path not exist we create it
os.makedirs("TooLowSamples")
shutil.move("FinalAudios\\" + f, "TooLowSamples\\" + f)
outfile_x = TemporaryFile()
np.save(outfile_x, X)
outfile_y = TemporaryFile()
np.save(outfile_y, Y)
# ------------------- RANDOMIZATION, UNNECESSARY TO UNDERSTAND THE CODE ------------------- #
else:
outfile_x.seek(0)
X = np.load(outfile_x)
outfile_y.seek(0)
Y = np.load(outfile_y)
Z = list(zip(X, Y))
shuffle(Z) # WE SHUFFLE X,Y TO PERFORM RANDOM ON THE TEST LEVEL
X, Y = zip(*Z)
X = list(X)
Y = list(Y)
lenX = len(X)
# ------------------- RANDOMIZATION, UNNECESSARY TO UNDERSTAND THE CODE ------------------- #
y_test = np.asarray(Y[:4000]) # CHOOSE 100 FOR TEST, OTHERS FOR TRAIN
x_test = np.asarray(X[:4000]) # CHOOSE 100 FOR TEST, OTHERS FOR TRAIN
x_train = np.asarray(X[4000:]) # CHOOSE 100 FOR TEST, OTHERS FOR TRAIN
y_train = np.asarray(Y[4000:]) # CHOOSE 100 FOR TEST, OTHERS FOR TRAIN
x_val = x_train[-4000:] # FROM THE TRAIN CHOOSE 100 FOR VALIDATION
y_val = y_train[-4000:] # FROM THE TRAIN CHOOSE 100 FOR VALIDATION
x_train = x_train[:-4000] # FROM THE TRAIN CHOOSE 100 FOR VALIDATION
y_train = y_train[:-4000] # FROM THE TRAIN CHOOSE 100 FOR VALIDATION
x_train = x_train.reshape(np.append(x_train.shape, (1, 1))) # RESHAPE FOR INPUT
x_test = x_test.reshape(np.append(x_test.shape, (1, 1))) # RESHAPE FOR INPUT
x_val = x_val.reshape(np.append(x_val.shape, (1, 1))) # RESHAPE FOR INPUT
features_shape = x_val.shape
# -------------- OUR TENSOR FLOW NEURAL NETWORK MODEL -------------- #
model = tf.keras.models.Sequential([
tf.keras.layers.Input(name='inputs', shape=(13, 1, 1), dtype='float32'),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', strides=1, name='block1_conv', input_shape=(13, 1, 1)),
tf.keras.layers.MaxPooling2D((3, 3), strides=(2,2), padding='same', name='block1_pool'),
tf.keras.layers.BatchNormalization(name='block1_norm'),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', strides=1, name='block2_conv',
input_shape=(13, 1, 1)),
tf.keras.layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same', name='block2_pool'),
tf.keras.layers.BatchNormalization(name='block2_norm'),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', strides=1, name='block3_conv',
input_shape=(13, 1, 1)),
tf.keras.layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same', name='block3_pool'),
tf.keras.layers.BatchNormalization(name='block3_norm'),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu', name='dense'),
tf.keras.layers.BatchNormalization(name='dense_norm'),
tf.keras.layers.Dropout(0.2, name='dropout'),
tf.keras.layers.Dense(10, activation='softmax', name='pred')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# -------------- OUR TENSOR FLOW NEURAL NETWORK MODEL -------------- #
print("fitting")
history = model.fit(x_train, y_train, epochs=15, validation_data=(x_val, y_val))
print("testing")
results.append(model.evaluate(x_test, y_test)[1])
print(results)
print(sum(results)/len(results))
for i in range(10000):
f_1 = only_speakers[randint(0, len(only_speakers) - 1)]
f_2 = only_speakers[randint(0, len(only_speakers) - 1)]
if " " not in f_1.split("_")[0]:
f_speaker_1 = f_1.split("_")[0]
else:
f_speaker_1 =f_1.split("_")[0].split(" ")[0]
if " " not in f_2.split("_")[0]:
f_speaker_2 = f_2.split("_")[0]
else:
f_speaker_2 =f_2.split("_")[0].split(" ")[0]
if f_speaker_2 == f_speaker_1:
winner.append(1)
else:
winner.append(0)
print(sum(winner)/len(winner))
#]
# if onlyfiles[randint(len(onlyfiles) - 1)] == onlyfiles[randint(len(onlyfiles) - 1)]
#pyplot.plot(history.history['loss'], label='train')
#pyplot.plot(history.history['val_loss'], label='test') Q
#pyplot.legend()
#pyplot.show()
Readin your post these are the following things I could suggest you fix/explore
42% is not that impressive of an accuracy for the task you have at hand, consider the way you are cross-validating e.g. how do you split between a validation, test and training dataset
Your dataset seems very limited. Your task is to identify the speaker. A single episode might not be enough data for this task.
You might want to consider Deep Neural Network libraries such as Keras and Tensorflow. Convolutions is something you can apply directly to the MFC Graph.
If you decide using Tensorflow or Keras consider Triplet-Loss, where you preset a positive and negative example.
Consider reading the current state of the art for your task: https://github.com/grausof/keras-sincnet
Consider reading https://arxiv.org/abs/1503.03832 and adopting it for speech recognition.
The easiest thing you can do to improve your results is adding CNN layers to extract features from the MFCC
i'm trying to capture long-term dependencies using LSTM, by creating a unit pulse signal every 62 points.
The idea is to go back 62 time-steps and copy the value for the next time-step, so as to predict the pulse, but lstm is not doing this...
import sys
import os
import numpy as np
import math
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from keras.models import Sequential, load_model
from keras.layers import LSTM, Dense, Flatten, Dropout
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras import backend as K
import tensorflow as tf
from tensorflow.python.client import device_lib
K.clear_session() #pulire eventuali sessioni precedenti (cosi i nomi dei layer ripartono da 0)
print(K.tensorflow_backend._get_available_gpus())
print(device_lib.list_local_devices())
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
config = tf.ConfigProto( device_count = {'GPU': 1 , 'CPU': 4} )
sess = tf.Session(config=config)
K.set_session(sess)
# hyper-parametri
params = {
"batch_size": 20,
"epochs": 1000,
"time_steps": 70,
}
OUTPUT_PATH = "/home/..."
TIME_STEPS = params["time_steps"]
BATCH_SIZE = params["batch_size"]
def generate_impulse(dim):
arr = np.zeros(dim)
frequency = 62
for i in range(0, len(arr)):
if i % frequency == 0:
arr[i] = 1
return arr
y = generate_impulse(1300)
plt.figure(figsize=(20,5))
plt.plot(y)
plt.title('unit impulse')
plt.ylabel('y')
plt.xlabel('x')
plt.show()
dataset
def create_timeseries(arr):
# Costruzione time series univariata, predict di un single-step.
# Prende i primi TIME_STEPS valori come input e calcola il sin del valore TIME_STEPS+1
dim_0 = len(arr) - TIME_STEPS
x = np.zeros((dim_0, TIME_STEPS))
y = np.zeros((dim_0,))
for i in range(dim_0):
x[i] = arr[i:TIME_STEPS+i] #TIME_STEPS+i non compreso
y[i] = arr[TIME_STEPS+i]
#print(x[i], y[i])
print("length of time-series i/o",x.shape,y.shape)
return x, y
x_ts, y_ts = create_timeseries(y)
len_train = int(len(x_ts)*80/100)
len_val = int(len(x_ts)*10/100)
#DATASET DI TRAINING: 80%
x_train = x_ts[0:len_train]
y_train = y_ts[0:len_train]
#DATASET DI VALIDATION: 10%
x_val = x_ts[len_train:len_train+len_val]
y_val = y_ts[len_train:len_train+len_val]
#DATASET DI TEST 10%
x_test = x_ts[len_train+len_val:]
y_test = y_ts[len_train+len_val:]
x_train =x_train.reshape((x_train.shape[0], x_train.shape[1], 1))
x_val =x_val.reshape((x_val.shape[0], x_val.shape[1], 1))
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], 1)
def create_model():
model = Sequential()
model.add(LSTM(1, input_shape=(TIME_STEPS, 1)))
model.compile(optimizer='adam', loss='mse')
return model
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1,
patience=50, min_delta=0.0001)
model = create_model()
history = model.fit(x_train, y_train, epochs=params["epochs"], verbose=2, batch_size=BATCH_SIZE, shuffle=False,
validation_data=(x_val, y_val), callbacks=[es])
plt.figure()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('MSE LOSS')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
mse loss
y_pred = model.predict(x_test, batch_size=BATCH_SIZE)
y_pred = y_pred.flatten()
error = mean_squared_error(y_test, y_pred)
plt.figure(figsize=(20,5))
plt.plot(y_pred)
plt.plot(y_test)
plt.title('PREDICTION ON TEST SET')
plt.ylabel('sin(x)')
plt.xlabel('x')
plt.legend(['Prediction', 'Real'], loc='upper left')
plt.show()
prediction test set
Training set give me the same results (it is the same signal..). I tried others LSTM models with more neurons but it doesn't work anyway.
You might consider training for more epochs. I created a simplified model and training set based on what I believe is the core of your idea:
from keras.models import Sequential
from keras.layers import LSTM
import numpy as np
TIME_STEPS=10
x_train = np.array([ [ [1],[0],[0],[0],[0],[0],[0],[0],[0],[0] ],
[ [0],[1],[0],[0],[0],[0],[0],[0],[0],[0] ],
[ [0],[0],[1],[0],[0],[0],[0],[0],[0],[0] ],
[ [0],[0],[0],[1],[0],[0],[0],[0],[0],[0] ],
[ [0],[0],[0],[0],[1],[0],[0],[0],[0],[0] ],
[ [0],[0],[0],[0],[0],[1],[0],[0],[0],[0] ],
[ [0],[0],[0],[0],[0],[0],[1],[0],[0],[0] ],
[ [0],[0],[0],[0],[0],[0],[0],[1],[0],[0] ],
[ [0],[0],[0],[0],[0],[0],[0],[0],[1],[0] ],
[ [0],[0],[0],[0],[0],[0],[0],[0],[0],[1] ]])
y_train = np.array([[1],[0],[0],[0],[0],[0],[0],[0],[0],[0]])
print(x_train.shape)
print(y_train.shape)
model = Sequential()
model.add(LSTM(1, input_shape=(TIME_STEPS,1)))
model.compile(optimizer='adam', loss='mse', metrics=['mse'])
model.fit(x_train, y_train, epochs=10000, verbose=0)
After training, I get the following predictions:
model.predict(x_train)
array([[ 0.9870746 ],
[ 0.00665453],
[-0.00303702],
[ 0.00697759],
[-0.02432432],
[-0.00701594],
[ 0.01387464],
[ 0.02281112],
[ 0.00439195],
[-0.04109564]], dtype=float32)
I'm not sure if it solves your problem completely, but it might give you a suggested direction to investigate. I hope this helps.
So this question is about GANs.
I am trying to do a trivial example for my own proof of concept; namely, generate images of hand written digits (MNIST). While most will approach this via deep convolutional gans (dgGANs), I am just trying to achieve this via the 1D array (i.e. instead of 28x28 gray-scale pixel values, a 28*28 1d array).
This git repo features a "vanilla" gans which treats the MNIST dataset as a 1d array of 784 values. Their output values look pretty acceptable so I wanted to do something similar.
Import statements
from __future__ import print_function
import matplotlib as mpl
from matplotlib import pyplot as plt
import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon import nn, utils
import numpy as np
import os
from math import floor
from random import random
import time
from datetime import datetime
import logging
ctx = mx.gpu()
np.random.seed(3)
Hyper parameters
batch_size = 100
epochs = 100
generator_learning_rate = 0.001
discriminator_learning_rate = 0.001
beta1 = 0.5
latent_z_size = 100
Load data
mnist = mx.test_utils.get_mnist()
# convert imgs to arrays
flattened_training_data = mnist["test_data"].reshape(10000, 28*28)
define models
G = nn.Sequential()
with G.name_scope():
G.add(nn.Dense(300, activation="relu"))
G.add(nn.Dense(28 * 28, activation="tanh"))
D = nn.Sequential()
with D.name_scope():
D.add(nn.Dense(128, activation="relu"))
D.add(nn.Dense(64, activation="relu"))
D.add(nn.Dense(32, activation="relu"))
D.add(nn.Dense(2, activation="tanh"))
loss = gluon.loss.SoftmaxCrossEntropyLoss()
init stuff
G.initialize(mx.init.Normal(0.02), ctx=ctx)
D.initialize(mx.init.Normal(0.02), ctx=ctx)
trainer_G = gluon.Trainer(G.collect_params(), 'adam', {"learning_rate": generator_learning_rate, "beta1": beta1})
trainer_D = gluon.Trainer(D.collect_params(), 'adam', {"learning_rate": discriminator_learning_rate, "beta1": beta1})
metric = mx.metric.Accuracy()
dynamic plot (for juptyer notebook)
import matplotlib.pyplot as plt
import time
def dynamic_line_plt(ax, y_data, colors=['r', 'b', 'g'], labels=['Line1', 'Line2', 'Line3']):
x_data = []
y_max = 0
y_min = 0
x_min = 0
x_max = 0
for y in y_data:
x_data.append(list(range(len(y))))
if max(y) > y_max:
y_max = max(y)
if min(y) < y_min:
y_min = min(y)
if len(y) > x_max:
x_max = len(y)
ax.set_ylim(y_min, y_max)
ax.set_xlim(x_min, x_max)
if ax.lines:
for i, line in enumerate(ax.lines):
line.set_xdata(x_data[i])
line.set_ydata(y_data[i])
else:
for i in range(len(y_data)):
l = ax.plot(x_data[i], y_data[i], colors[i], label=labels[i])
ax.legend()
fig.canvas.draw()
train
stamp = datetime.now().strftime('%Y_%m_%d-%H_%M')
logging.basicConfig(level=logging.DEBUG)
# arrays to store data for plotting
loss_D = nd.array([0], ctx=ctx)
loss_G = nd.array([0], ctx=ctx)
acc_d = nd.array([0], ctx=ctx)
labels = ['Discriminator Loss', 'Generator Loss', 'Discriminator Acc.']
%matplotlib notebook
fig, ax = plt.subplots(1, 1)
ax.set_xlabel('Time')
ax.set_ylabel('Loss')
dynamic_line_plt(ax, [loss_D.asnumpy(), loss_G.asnumpy(), acc_d.asnumpy()], labels=labels)
for epoch in range(epochs):
tic = time.time()
data_iter.reset()
for i, batch in enumerate(data_iter):
####################################
# Update Disriminator: maximize log(D(x)) + log(1-D(G(z)))
####################################
# extract batch of real data
data = batch.data[0].as_in_context(ctx)
# add noise
# Produce our noisey input to the generator
latent_z = mx.nd.random_normal(0,1,shape=(batch_size, latent_z_size), ctx=ctx)
# soft and noisy labels
# real_label = mx.nd.ones((batch_size, ), ctx=ctx) * nd.random_uniform(.7, 1.2, shape=(1)).asscalar()
# fake_label = mx.nd.ones((batch_size, ), ctx=ctx) * nd.random_uniform(0, .3, shape=(1)).asscalar()
# real_label = nd.random_uniform(.7, 1.2, shape=(batch_size), ctx=ctx)
# fake_label = nd.random_uniform(0, .3, shape=(batch_size), ctx=ctx)
real_label = mx.nd.ones((batch_size, ), ctx=ctx)
fake_label = mx.nd.zeros((batch_size, ), ctx=ctx)
with autograd.record():
# train with real data
real_output = D(data)
errD_real = loss(real_output, real_label)
# train with fake data
fake = G(latent_z)
fake_output = D(fake.detach())
errD_fake = loss(fake_output, fake_label)
errD = errD_real + errD_fake
errD.backward()
trainer_D.step(batch_size)
metric.update([real_label, ], [real_output,])
metric.update([fake_label, ], [fake_output,])
####################################
# Update Generator: maximize log(D(G(z)))
####################################
with autograd.record():
output = D(fake)
errG = loss(output, real_label)
errG.backward()
trainer_G.step(batch_size)
####
# Plot Loss
####
# append new data to arrays
loss_D = nd.concat(loss_D, nd.mean(errD), dim=0)
loss_G = nd.concat(loss_G, nd.mean(errG), dim=0)
name, acc = metric.get()
acc_d = nd.concat(acc_d, nd.array([acc], ctx=ctx), dim=0)
# plot array
dynamic_line_plt(ax, [loss_D.asnumpy(), loss_G.asnumpy(), acc_d.asnumpy()], labels=labels)
name, acc = metric.get()
metric.reset()
logging.info('Binary training acc at epoch %d: %s=%f' % (epoch, name, acc))
logging.info('time: %f' % (time.time() - tic))
output
img = G(mx.nd.random_normal(0,1,shape=(100, latent_z_size), ctx=ctx))[0].reshape((28, 28))
plt.imshow(img.asnumpy(),cmap='gray')
plt.show()
Now this doesn't get nearly as good as the repo's example from above. Although fairly similar.
Thus I was wondering if you could take a look and figure out why:
the colors are inverted
why the results are sub par
I have been fiddling around with this trying a lot of various things to improve the results (I will list this in a second), but for the MNIST dataset this really shouldn't be needed.
Things I have tried (and I have also tried a host of combinations):
increasing the generator network
increasing the discriminator network
using soft labeling
using noisy labeling
batch norm after every layer in the generator
batch norm of the data
normalizing all values between -1 and 1
leaky relus in the generator
drop out layers in the generator
increased learning rate of discriminator compared to generator
decreased learning rate of i compared to generator
Please let me know if you have any ideas.
1) If you look into original dataset:
training_set = mnist["train_data"].reshape(60000, 28, 28)
plt.imshow(training_set[10,:,:], cmap='gray')
you will notice that the digits are white on a black background. So, technically speaking, your results are not inversed - they match the pattern of original images you used as a real data.
If you want to invert colors for visualization purposes, you can easily do that by changing the pallete to reversed one by adding '_r' (it works for all color palletes):
plt.imshow(img.asnumpy(), cmap='gray_r')
You also can play with ranges of colors by changing vmin and vmax parameters. They control how big the difference between colors should be. By default it is calculated automatically based on provided set.
2) "Why the results are sub par" - I think this is exactly the reason why the community started to use dcGANs. To me the results in the git repo you provided are quite noisy. Surely, they are different from what you receive, and you can achieve the same quality just by changing your activation functions from tanh to sigmoid as in the example on github:
G = nn.Sequential()
with G.name_scope():
G.add(nn.Dense(300, activation="relu"))
G.add(nn.Dense(28 * 28, activation="sigmoid"))
D = nn.Sequential()
with D.name_scope():
D.add(nn.Dense(128, activation="relu"))
D.add(nn.Dense(64, activation="relu"))
D.add(nn.Dense(32, activation="relu"))
D.add(nn.Dense(2, activation="sigmoid"))
Sigmoid never goes below zero and it works better in this scenario. Here is a sample picture I get if I train updated model for 30 epochs (the rest of the hyperparameters are same).
If you decide to explore dcGAN to get even better results, take a look here - https://mxnet.incubator.apache.org/tutorials/unsupervised_learning/gan.html It is a well explained tutorial on how to build dcGAN with Mxnet and Gluon. By using dcGAN you will get way better results than that.