cntk.input_variable() v.s. cntk.sequence.input_variable() - placeholder

what is the difference between cntk.input_variable() and cntk.sequence.input_variable()? I want to create some placeholders for the inputs and output, but not sure which one I should use? PS. my CNTK is version 2.4.

cntk.input_variable creates an input variable that only has a single dynamic axis (the batch axis). Values passed to this type of input variable should be samples (or minibatches of samples), not sequences. cntk.sequence.input_variable creates an input variable with two dynamic axes, a sequence axis and a batch axis. The expected input value for this type of input is an entire sequence (of an arbitrary length).

Related

Does Q-Learning apply here?

Let's say that we have an algorithm that given a dataset point, it runs some analysis on it and returns the results. The algorithm has a user-defined parameter X that affects the run-time of the algorithm (result of the algorithm is always constant for the same input point). Also, we already know that there's a relation between dataset point and the parameter X. For instance, if two dataset points are close to each other, their parameter X will also be the same.
Can we say that in this example we have the following and thus can use Q-Learning to find the best parameter X given any dataset point?
Initial state: dataset point, current value of X (for initial state = 0)
Terminal state: dataset point, current value of X (the value chosen based on action)
Actions: Different values that X can have
Reward: -1 if execution time decreases, +1 if it increases, 0 if it stays the same
Is it correct if we define different input dataset points as episodes and different values of X as the steps in each episode (where in each step, an action is chosen either randomly or via the network)? In this case, what would be the input to the neural network?
Since all of the examples and implementations I've seen so far are containing several states where each state is dependent on the previous one, I'm confused with my scenario where I only have two states.

Encoding numeric nominal values in machine learning

I am working on a network data based machine learning problem where one of the columns in my dataset is Destination Port where the values are like 30, 80, 1024, etc..
Since the numeric values in this column are not ordinal, how do I transform this column in some way so that I can put it as an input to the machine learning model? The column has about 480 unique ports.
it's called normalization and the goal of normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information.
# Create x, where x the 'scores' column's values as floats
x = df[['name_of_ur_column']].values.astype(float)
# Create a minimum and maximum processor object
min_max_scaler = preprocessing.MinMaxScaler()
# Create an object to transform the data to fit minmax processor
x_scaled = min_max_scaler.fit_transform(x)
or you can use the original format
# Run the normalizer on the dataframe
df_normalized = pd.DataFrame(x_scaled)
normalized_df=(df-df.mean())/df.std()
to use min-max normalization:
normalized_df=(df-df.min())/(df.max()-df.min())
Since Destination Port is a Nominal feature, it can be encoded either using label encoding or one hot encoding.
Label encoding
Advantage: No increase in dimension
Disadvantage: Can have an ordinal effect on the model
One hot encoding
Advantage: No ordinal effect on model
Disadvantage: Increase in dimension

Using sample_weights with fit_generator()

Inside an autoregressive continuous problem, when the zeros take too much place, it is possible to treat the situation as a zero-inflated problem (i.e. ZIB). In other words, instead of working to fit f(x), we want to fit g(x)*f(x) where f(x) is the function we want to approximate, i.e. y, and g(x) is a function which output a value between 0 and 1 depending if a value is zero or non-zero.
Currently, I have two models. One model which gives me g(x) and another model which fits g(x)*f(x).
The first model gives me a set of weights. This is where I need your help. I can use the sample_weights arguments with model.fit(). As I work with tremendous amount of data, then I need to work with model.fit_generator(). However, fit_generator() does not have the argument sample_weights.
Is there a work around to work with sample_weights inside fit_generator()? Otherwise, how can I fit g(x)*f(x) knowing that I have already a trained model for g(x)?
You can provide sample weights as the third element of the tuple returned by the generator. From Keras documentation on fit_generator:
generator: A generator or an instance of Sequence (keras.utils.Sequence) object in order to avoid duplicate data when using multiprocessing. The output of the generator must be either
a tuple (inputs, targets)
a tuple (inputs, targets, sample_weights).
Update: Here is a rough sketch of a generator that returns the input samples and targets as well as the sample weights obtained from model g(x):
def gen(args):
while True:
for i in range(num_batches):
# get the i-th batch data
inputs = ...
targets = ...
# get the sample weights
weights = g.predict(inputs)
yield inputs, targets, weights
model.fit_generator(gen(args), steps_per_epoch=num_batches, ...)

Input of LSTM seq2seq network - Tensorflow

Using the Tensorflow seq2seq tutorial code I am creating a character-based chatbot. I don't use word embeddings. I have an array of characters (the alphabet and some punctuation marks) and special symbols like the GO, EOS and UNK symbol.
Because I'm not using word embeddings, I use the standard tf.nn.seq2seq.basic_rnn_seq2seq() seq2seq model. However, I am confused about what shape encoder_inputs and decoder_inputs should have. Should they be an array of integers, corresponding to the index of the characters in the alphabet-array, or should I turn those integers into one-hot vectors first?
How many input nodes does one LSTM cell have? Can you specify that? Because I guess in my case an LSTM cell should have an input neuron for each letter in the alphabet (therefore the one-hot vectors?).
Also, what is the LSTM "size" you have to pass in the constructor tf.nn.rnn_cell.BasicLSTMCell(size)?
Thank you.
Appendix: these are the bugs I am trying to fix.
When I use the following code, according to the tutorial:
for i in xrange(buckets[-1][0]): # Last bucket is the biggest one.
self.encoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="encoder{0}".format(i)))
for i in xrange(buckets[-1][1] + 1):
self.decoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="decoder{0}".format(i)))
self.target_weights.append(tf.placeholder(dtype, shape=[None], name="weight{0}".format(i)))
And run the self_test() function, I get the error:
ValueError: Linear is expecting 2D arguments: [[None], [None, 32]]
Then, when I change the shapes in the above code to shape=[None, 32] I get this error:
TypeError: Expected int32, got -0.21650635094610965 of type 'float' instead.
The number of inputs of an lstm cell is the dimension of whatever tensor you pass as inputs to the tf.rnn function when instantiating things.
The size argument is the number of hidden units in your lstm (so a bigger number is slower but can lead to more accurate models).
I'd need a bigger stack trace to understand these errors.
It turns out the size argument passed to BasicLSTMCell represents both the size of the hidden state of the LSTM and the size of the input layer. So if you want a different hidden size than input size, you can first propagate your inputs through an additional projection layer or use the built-in seq2seq word embeddings function.

What is a "label" in Caffe?

In Caffe when you are defining your inputs for the NN in the protobuf file, you can input "data" and "label". I'm guessing label contains the expected output for training data (what it is normally considered the y values in Machine Learning literature).
My problem is that in the caffe.proto file, label is defined as a scalar (int or long). At least with data, I can set it to an numpy array, because it takes String values. If I'm training for more than one prediction output, how could I pass it as an array?
Or am I mistaken? What is label? What is it for? And how can I pass the y values to caffe?
The basic use case of caffe used to be image classification: assigning a single integer label per input image. Thus, the "datum" data structure reserves space for a 4D float array (batches of 3 channels images) and an integer "label" per image in the batch.
This restriction can be easily overcome using HDF5 input data layer.
See e.g., this answer.

Resources