Is a step response enough information for deconvolution?

I have a measurement system, which responds to a step (green line) with an exponential decline (blue line, which would be the measured data).
I want to go back from the blue line to the green line using deconvolution. Is this step-response already sufficient information for the deconvolution or would it be necessary to have the impulse response?
Thanks for your help,

I had the same problem. I think that it can be addressed using of the fact that Dirac delta is a derivative of Heaviside function. You just need to take numerical derivative of your step response and use it as impulse response for the deconvolution.
Here is an example:
import numpy as np
from scipy.special import erfinv, erf
from scipy.signal import deconvolve, convolve, resample, decimate, resample_poly
from numpy.fft import fft, ifft, ifftshift
def deconvolve_fun(obs, signal):
"""Find convolution filter
Finds convolution filter from observation and impulse response.
Noise-free signal is assumed.
signal = np.hstack((signal, np.zeros(len(obs) - len(signal))))
Fobs = np.fft.fft(obs)
Fsignal = np.fft.fft(signal)
filt = np.fft.ifft(Fobs/Fsignal)
return filt
def wiener_deconvolution(signal, kernel, lambd=1e-3):
"""Applies Wiener deconvolution to find true observation from signal and filter
The function can be also used to estimate filter from true signal and observation
# zero pad the kernel to same length
kernel = np.hstack((kernel, np.zeros(len(signal) - len(kernel))))
H = fft(kernel)
deconvolved = np.real(ifft(fft(signal)*np.conj(H)/(H*np.conj(H) + lambd**2)))
return deconvolved
def get_signal(time, offset_x, offset_y, reps=4, lambd=1e-3):
"""Model step response as error function
ramp_up = erf(time * multiplier)
ramp_down = 1 - ramp_up
if (reps % 1) == 0.5:
signal = np.hstack(( np.zeros(offset_x),
ramp_up)) + offset_y
signal = np.hstack(( np.zeros(offset_x),
np.tile(np.hstack((ramp_up, ramp_down)), reps),
np.zeros(offset_x))) + offset_y
signal += np.random.randn(*signal.shape) * lambd
return signal
def make_filter(signal, offset_x):
"""Obtain filter from response to step function
Takes derivative of Heaviside to get Dirac. Avoid zeros at both ends.
# impulse response. Step function is integration of dirac delta
hvsd = signal[(offset_x):]
dirac = np.gradient(hvsd)# + offset_y
dirac = dirac[dirac > 0.0001]
return dirac, hvsd
def get_step(time, offset_x, offset_y, reps=4):
""""Creates true step response
ramp_up = np.heaviside(time, 0)
ramp_down = 1 - ramp_up
step = np.hstack(( np.zeros(offset_x),
np.tile(np.hstack((ramp_up, ramp_down)), reps),
np.zeros(offset_x))) + offset_y
return step
# Worst case scenario from specs : signal Time t98% < 60 s at 25 °C
multiplier = erfinv(0.98) / 60
offset_y = .01
offset_x = 300
reps = 1
time = np.arange(301)
lambd = 0
sampling_time = 3 #s
signal = get_step(time, offset_x, offset_y, reps=reps)
filter = get_signal( time, offset_x, offset_y, reps=0.5, lambd=lambd)
filter, hvsd = make_filter(filter, offset_x)
observation = get_signal(time, offset_x, offset_y, reps=reps, lambd=lambd)
assert len(signal) == len(observation)
observation_est = convolve(signal, filter, mode="full")[:len(observation)]
signal_est = wiener_deconvolution(observation, filter, lambd)[:len(observation)]
filt_est = wiener_deconvolution(observation, signal, lambd)[:len(filter)]
This will allow you to obtain these two figures:
Heaviside and Dirac
Signal and Filter Estimate
You should also benefit from checking other related posts and the example of Wiener deconvolution that I partly use in my code.
Let me know if this helps.


Training seq2seq LM over multiple iterations in PyTorch, seems like lack of connection between encoder and decoder

My seq2seq model seems to only learn to produce sequences of popular words like:
"i don't . i don't . i don't . i don't . i don't"
I think that might be due to a lack of actual data flow between encoder and decoder.
That happens whether I use encoder.init_hidden() or encoder_hidden.detach().
If I use neither, I get an error:
"RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward."
If I try to use retain_graph=True, I get another error:
"RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [256, 768]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)."
This seems to be a very common use case, but from all similar questions and all the documentation and experiments, I cannot solve it.
Am I missing something obvious?
encoder = Encoder(embedding_dim, hidden_size, max_seq_len, num_layers, vocab.len(), word_embeddings).to(device)
decoder = Decoder(embedding_dim, hidden_size, num_layers, vocab.len()).to(device)
loss_function = nn.CrossEntropyLoss(ignore_index=0)
optimizer = optim.SGD(params=encoder.parameters() + decoder.parameters(), lr=learn_rate)
encoder_hidden = encoder.init_hidden()
for epoch in range(num_epochs):
epoch_loss = 0
num_samples = 0
j = 0
for prompts, responses in train_data_loader:
#encoder_hidden = encoder.init_hidden() # new tensor of zeroes
encoder_hidden = encoder_hidden.detach()
encoder_output, encoder_hidden = encoder(prompts, encoder_hidden)
decoder_hidden = encoder.transform_hidden(encoder_hidden)
batch_size = responses.size(0)
decoder_input = torch.tensor([[SOS_TOKEN]] * batch_size, device=device)
decoder_outputs = []
sequence_length = responses.shape[1]
for i in range(sequence_length):
word_index = responses[:, i:i+1]
decoder_output, _ = decoder(decoder_input, decoder_hidden)
decoder_input = word_index
decoder_outputs_t =, dim=1)
decoder_outputs_t = decoder_outputs_t.permute(0, 2, 1)
loss = loss_function(decoder_outputs_t, responses)
epoch_loss += loss.item()
num_samples += 1
j += 1
mean_loss = epoch_loss / num_samples

How can I improve this Reinforced Learning scenario in Stable Baselines3?

In this scenario, I present a box observation with numbers 0, 1 or 2 and shape (1, 10).
The odds for 0 and 2 are 2% each, and 96% for 1.
I want the model to learn to pick the index of any 2 that comes. If it doesn't have a 2, just choose 0.
Bellow is my code:
import numpy as np
import gym
from gym import spaces
from stable_baselines3 import PPO, DQN, A2C
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import VecFrameStack
action_length = 10
class TestBot(gym.Env):
def __init__(self):
super(TestBot, self).__init__()
self.total_rewards = 0
self.time = 0
self.action_space = spaces.Discrete(action_length)
self.observation_space = spaces.Box(low=0, high=2, shape=(1, action_length), dtype=np.float32)
def generate_next_obs(self):
p = [0.02, 0.02, 0.96]
a = [0, 2, 1]
self.observation = np.random.choice(a, size=(1, action_length), p=p)
if 2 in self.observation[0][1:]:
self.best_reward += 1
def reset(self):
if self.time != 0:
print('Total rewards: ', self.total_rewards, 'Best possible rewards: ', self.best_reward)
self.best_reward = 0
self.time = 0
self.total_rewards = 0
self.last_observation = self.observation
return self.observation
def step(self, action):
reward = 0
if action != 0:
last_value = self.last_observation[0][action]
if last_value == 2:
reward = 1
reward = -1
self.time += 1
done = self.time == 4096
info = {}
self.last_observation = self.observation
self.total_rewards += reward
return self.observation, reward, done, info
For training, I used the following:
env = TestBot()
env = make_vec_env(lambda: env, n_envs=1)
model = PPO('MlpPolicy', env, verbose=0)
iters = 0
while True:
iters += 1
model.learn(total_timesteps=4096, reset_num_timesteps=True)
PPO gave the best result, which wasn't so great. It learned to have positive rewards, but took a long time and got stuck in a point far from optimal.
How can I improve the learning of this scenario?
I managed to solve my problem by tunning the PPO parameters.
I had to change the following parameters:
gamma: from 0.99 to 0. It determines the importance of future rewards in the decision-making process. A value of 0 means that only imediate rewards should be considered.
gae_lambda: from 0.95 to 0.65. The gae_lambda parameter in Reinforcement Learning is used in the calculation of the Generalized Advantage Estimation (GAE). GAE is a method for estimating the advantage function in reinforcement learning, which is a measure of how much better a certain action is compared to the average action. A lower value means that PPO doesn't need to use the GAE too much.
clip_range: from 0.2 to function based. It determines the percentage of the decisions that will be done for exploration. At the end, exploration starts to be irrelevant. So, I made a function that uses a high exploration in the first few iteractions and goes to 0 at the end.
I also made a small modification in the environment in order to penalize more the loss of oportunity of picking a number 2 index, but that is done just to accelerate the training.
The following is my final code:
env = TestBot()
env = make_vec_env(lambda: env, n_envs=1)
iters = 0
def clip_range_schedule():
def real_clip_range(progress):
global iters
cr = 0.2
if iters > 20:
cr = 0.0
elif iters > 12:
cr = 0.05
elif iters > 6:
cr = 0.1
return cr
return real_clip_range
model = PPO('MlpPolicy', env, verbose=0, gamma=0.0, gae_lambda=0.65, clip_range=clip_range_schedule())
while True:
iters += 1
model.learn(total_timesteps=4096, reset_num_timesteps=True)

Modifying the loss in ppo in stable-baselines3

I'm trying to implement an addition to the loss function of the ppo algorithm in stable-baselines3. For this I collected additional observations for the states s(t-10) and s(t+1) which I can access in the train-function of the PPO class in as part of the rollout_buffer.
I'm using a 3-layer-mlp as my network architecture and need the outputs of the second layer for the triplet (s(t-α), s(t), s(t+1)) to use them to calculate L = max(d(s(t+1) , s(t)) − d(s(t+1) , s(t−α)) + γ, 0), where d is the L2-distance.
Finally I want to add this term to the old loss, so loss = loss + 0.3 * L
This is my implementation starting with the original loss in line 242:
loss = policy_loss + self.ent_coef * entropy_loss + self.vf_coef * value_loss
net1 = nn.Sequential(*list(self.policy.mlp_extractor.policy_net.children())[:-1])
L_losses = []
a = 0
obs = rollout_data.observations
obs_alpha = rollout_data.observations_alpha
obs_plusone = rollout_data.observations_plusone
inds = rollout_data.inds
for i in inds:
if i > alpha: # only use observations for which L can be calculated
fs_t = net1(obs[a])
fs_talpha = net1(obs_alpha[a])
fs_tone = net1(obs_plusone[a])
L = max(
th.norm(th.subtract(fs_tone, fs_t)) - th.norm(th.subtract(fs_tone, fs_talpha)) + 1.0, 0.0)
a += 1
L_loss = th.mean(th.FloatTensor(L_losses))
loss += 0.3 * L_loss
So with net1 I tried to get a clone of the original network with the outputs from the second layer. I am unsure if this is the right way to do this.
I do have some questions about my approach as the resulting performance is slightly worse compared to without the added term although it should be slightly better:
Is my way of getting the outputs of the second layer of the mlp network working?
When loss.backward() is called can the gradient be calculated correctly (with the new term included)?

Removing Softmax from last layer yields a lot better results

I was solving an nlp task, of converting English sentences to German in Keras. But the model was not learning... But as soon as I removed the softmax from the last layer, it started working! Is this a bug in Keras, or it has to do with something else?
optimizer = Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction='none')
def loss_function(real, pred):
mask = tf.math.logical_not(tf.math.equal(real, 0))
loss_ = loss_object(real, pred)
mask = tf.cast(mask, dtype=loss_.dtype)
loss_ *= mask
return tf.reduce_mean(loss_)
batch_size = 64
batch_per_epoch = int(train_x1.shape[0] / batch_size)
embed_dim = 256
units = 1024
attention_units = 10
encoder_embed = Embedding(english_vocab_size, embed_dim)
decoder_embed = Embedding(german_vocab_size, embed_dim)
encoder = GRU(units, return_sequences=True, return_state=True, recurrent_initializer='glorot_uniform')
decoder = GRU(units, return_sequences=True, return_state=True, recurrent_initializer='glorot_uniform')
dense = Dense(german_vocab_size)
attention1 = Dense(attention_units)
attention2 = Dense(attention_units)
attention3 = Dense(1)
def train_step(english_input, german_target):
loss = 0
with tf.GradientTape() as tape:
enc_output, enc_hidden = encoder(encoder_embed(english_input))
dec_hidden = enc_hidden
dec_input = tf.expand_dims([german_tokenizer.word_index['startseq']] * batch_size, 1)
for i in range(1, german_target.shape[1]):
attention_weights = attention1(enc_output) + attention2(tf.expand_dims(dec_hidden, axis=1))
attention_weights = tanh(attention_weights)
attention_weights = attention3(attention_weights)
attention_weights = Softmax(axis=1)(attention_weights)
Context_Vector = tf.reduce_sum(enc_output * attention_weights, axis=1)
Context_Vector = tf.expand_dims(Context_Vector, axis = 1)
x = decoder_embed(dec_input)
x = Concatenate(axis=-1)([x, Context_Vector])
dec_output, dec_hidden = decoder(x)
output = tf.reshape(dec_output, (-1, dec_output.shape[2]))
prediction = dense(output)
loss += loss_function(german_target[:, i], prediction)
dec_input = tf.expand_dims(german_target[:, i], 1)
batch_loss = (loss / int(german_target.shape[1]))
variables = encoder_embed.trainable_variables + decoder_embed.trainable_variables + encoder.trainable_variables + decoder.trainable_variables + dense.trainable_variables + attention1.trainable_variables + attention2.trainable_variables + attention3.trainable_variables
gradients = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(gradients, variables))
return batch_loss
Code Summary
The code just take the English sentence and German Sentence as input (It takes German sentence as input to implement Teacher-Forcing Method), and predicts the translated German sentence.
The loss function is SparseCategoricalCrossentropy, but it subtracts the loss of the 0. For example, lets say, we have a sentence, that is : 'StartSeq This is Stackoverflow 0 0 0 0 0 EndSeq' (The sentence also has a zero padding to make all the input sentences of the same length). Now, we would calculate loss for every word but not for the 0's. Doing this makes the model better.
Note - this model implementation implements Bahdanau Attention
When I apply softmax on the predicted probabilities by the last layer, the model doesn't learns anything. But it learns properly without softmax in the last layer. Why is this happening?

Use neural network to learn a square wave function

Out of curiosity, I am trying to build a simple fully connected NN using tensorflow to learn a square wave function such as the following one:
Therefore the input is a 1D array of x value (as the horizontal axis), and the output is a binary scalar value. I used tf.nn.sparse_softmax_cross_entropy_with_logits as loss function, and tf.nn.relu as activation. There are 3 hidden layers (100*100*100) and a single input node and output node. The input data are generated to match the above wave shape and therefore the data size is not a problem.
However, the trained model seems to fail completed, predicting for the negative class always.
So I am trying to figure out why this happened. Whether the NN configuration is suboptimal, or it is due to some mathematical flaw in NN beneath the surface (though I think NN should be able to imitate any function).
As per suggestions in the comment section, here is the full code. One thing I noticed saying wrong earlier is, there were actually 2 output nodes (due to 2 output classes):
See if neural net can find piecewise linear correlation in the data
import time
import os
import tensorflow as tf
import numpy as np
def generate_placeholder(batch_size):
x_placeholder = tf.placeholder(tf.float32, shape=(batch_size, 1))
y_placeholder = tf.placeholder(tf.float32, shape=(batch_size))
return x_placeholder, y_placeholder
def feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, loop):
x_selected = [[None]] * batch_size
y_selected = [None] * batch_size
for i in range(batch_size):
x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]
y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]
feed_dict = {x_placeholder: x_selected,
y_placeholder: y_selected}
return feed_dict
def inference(input_x, H1_units, H2_units, H3_units):
with tf.name_scope('H1'):
weights = tf.Variable(tf.truncated_normal([1, H1_units], stddev=1.0/2), name='weights')
biases = tf.Variable(tf.zeros([H1_units]), name='biases')
a1 = tf.nn.relu(tf.matmul(input_x, weights) + biases)
with tf.name_scope('H2'):
weights = tf.Variable(tf.truncated_normal([H1_units, H2_units], stddev=1.0/H1_units), name='weights')
biases = tf.Variable(tf.zeros([H2_units]), name='biases')
a2 = tf.nn.relu(tf.matmul(a1, weights) + biases)
with tf.name_scope('H3'):
weights = tf.Variable(tf.truncated_normal([H2_units, H3_units], stddev=1.0/H2_units), name='weights')
biases = tf.Variable(tf.zeros([H3_units]), name='biases')
a3 = tf.nn.relu(tf.matmul(a2, weights) + biases)
with tf.name_scope('softmax_linear'):
weights = tf.Variable(tf.truncated_normal([H3_units, 2], stddev=1.0/np.sqrt(H3_units)), name='weights')
biases = tf.Variable(tf.zeros([2]), name='biases')
logits = tf.matmul(a3, weights) + biases
return logits
def loss(logits, labels):
labels = tf.to_int32(labels)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')
return tf.reduce_mean(cross_entropy, name='xentropy_mean')
def inspect_y(labels):
return tf.reduce_sum(tf.cast(labels, tf.int32))
def training(loss, learning_rate):
tf.summary.scalar('lost', loss)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)
return train_op
def evaluation(logits, labels):
labels = tf.to_int32(labels)
correct = tf.nn.in_top_k(logits, labels, 1)
return tf.reduce_sum(tf.cast(correct, tf.int32))
def run_training(x, y, batch_size):
with tf.Graph().as_default():
x_placeholder, y_placeholder = generate_placeholder(batch_size)
logits = inference(x_placeholder, 100, 100, 100)
Loss = loss(logits, y_placeholder)
y_sum = inspect_y(y_placeholder)
train_op = training(Loss, 0.01)
init = tf.global_variables_initializer()
sess = tf.Session()
max_steps = 10000
for step in range(max_steps):
start_time = time.time()
feed_dict = feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, step)
_, loss_val =[train_op, Loss], feed_dict = feed_dict)
duration = time.time() - start_time
if step % 100 == 0:
print('Step {}: loss = {:.2f} {:.3f}sec'.format(step, loss_val, duration))
x_test = np.array(range(1000)) * 0.001
x_test = np.reshape(x_test, (1000, 1))
_ =, feed_dict={x_placeholder: x_test})
print(min(_[:, 0]), max(_[:, 0]), min(_[:, 1]), max(_[:, 1]))
if __name__ == '__main__':
population = 10000
input_x = np.random.rand(population)
input_y = np.copy(input_x)
for bin in range(10):
print(bin, bin/10, 0.5 - 0.5*(-1)**bin)
input_y[input_x >= bin/10] = 0.5 - 0.5*(-1)**bin
batch_size = 1000
input_x = np.reshape(input_x, (population, 1))
run_training(input_x, input_y, batch_size)
Sample output shows that the model always prefer the first class over the second, as shown by min(_[:, 0]) > max(_[:, 1]), i.e. the minimum logit output for the first class is higher than the maximum logit output for the second class, for a sample size of population.
My mistake. The problem occurred in the line:
for i in range(batch_size):
x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]
y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]
Python is mutating the whole list of x_selected to the same value. Now this code issue is resolved. The fix is:
x_selected = np.zeros((batch_size, 1))
y_selected = np.zeros((batch_size,))
for i in range(batch_size):
x_selected[i, 0] = x[(loop*batch_size + i) % x.shape[0], 0]
y_selected[i] = y[(loop*batch_size + i) % y.shape[0]]
After this fix, the model is showing more variation. It currently outputs class 0 for x <= 0.5 and class 1 for x > 0.5. But this is still far from ideal.
So after changing the network configuration to 100 nodes * 4 layers, after 1 million training steps (batch size = 100, sample size = 10 million), the model is performing very well showing only errors at the edges when y flips.
Therefore this question is closed.
You essentially try to learn a periodic function and the function is highly non-linear and non-smooth. So it is NOT simple as it looks like. In short, a better representation of the input feature helps.
Suppose your have a period T = 2, f(x) = f(x+2).
For a reduced problem when input/output are integers, your function is then f(x) = 1 if x is odd else -1. In this case, your problem would be reduced to this discussion in which we train a Neural Network to distinguish between odd and even numbers.
I guess the second bullet in that post should help (even for the general case when inputs are float numbers).
Try representing the numbers in binary using a fixed length precision.
In our reduced problem above, it's easy to see that the output is determined iff the least-significant bit is known.
decimal binary -> output
1: 0 0 1 -> 1
2: 0 1 0 -> -1
3: 0 1 1 -> 1
I created the model and the structure for the problem of recognizing odd/even numbers in here.
If you abstract the fact that:
decimal binary -> output
1: 0 0 1 -> 1
2: 0 1 0 -> -1
3: 0 1 1 -> 1
Is almost equivalent to:
decimal binary -> output
1: 0 0 1 -> 1
2: 0 1 0 -> 0
3: 0 1 1 -> 1
You may update the code to fit your need.
