Transformation mapping between input and output files using Machine Learning - machine-learning

I have two files with me, an input file and an output file. The input file goes through a transformation logic and produces the output file. The issue here is that, I am not aware of the transformation logic between the input and output file.
The input file contains 10 fields and output file contains 7 fields. These 10 fields are transformed into 7 fields using a transformation logic.
Is there way using some machine learning algorithm, to build a model that automatically deduce the relationship between input and output and will be able to predict the output based on the data in the input file?

I think I have something, that might help you solve your issue:
You have various inputs with different datatypes. Also you have different outputs with different datatypes. Lets have this dataset as example when working with tensorflow and keras:
x_categorical=[1,2,3,4,5]
x_categorical_2=np.random.choice(x_categorical, len(x_categorical))
x_continuus=np.random.random_sample(len(x_categorical))
y_categorical = [0,2,3,4,5]
y_continuus = np.random.random_sample(len(x_categorical))
Create the tf.data.Datasets and zip the x and y values together, so it fits the model input:
ds_x = tf.data.Dataset.from_tensor_slices(x_categorical)
ds_x1 = tf.data.Dataset.from_tensor_slices(x_categorical_2)
ds_x2 = tf.data.Dataset.from_tensor_slices(x_continuus)
dataset_x = tf.data.Dataset.zip((ds_x,ds_x1,ds_x2))
ds_y = tf.data.Dataset.from_tensor_slices(y_categorical)
ds_y1 = tf.data.Dataset.from_tensor_slices(y_continuus)
dataset_y = tf.data.Dataset.zip((ds_y,ds_y1))
dataset_train = tf.data.Dataset.zip((dataset_x, dataset_y))
Build an example model, which concatenates the inputs, has one layer which contains the "logic" for the data combination and two more layers, that have a logic for each output:
from tensorflow.keras import layers as layer
layer_input_categorical = layer.Input(shape=(1),name="x_categorical", dtype=tf.float32)
layer_input_categorical_2 = layer.Input(shape=(1),name="x_categorical_2", dtype=tf.float32)
layer_input_continuus = layer.Input(shape=(1),name="x_continuus", dtype=tf.float32)
concat_layer = layer.Concatenate()([layer_input_categorical,layer_input_categorical_2, layer_input_continuus])
dense_layer = layer.Dense(100)(concat_layer)
dense_layer_out_cat = layer.Dense(50)(dense_layer)
dense_layer_out_con = layer.Dense(50)(dense_layer)
output_categorical = layer.Dense(5, activation="softmax")(dense_layer_out_cat)
output_continuus = layer.Dense(1, activation="sigmoid")(dense_layer_out_con)
model = tf.keras.Model(inputs=[layer_input_categorical, layer_input_categorical_2, layer_input_continuus], \
outputs=[output_categorical, output_continuus])
model.compile(optimizer="Nadam", loss=["mse","sparse_categorical_crossentropy"])
Please note the usage of two loss functions (mse for regression and sparse_categorical_crossentropy for classification.
Also note, that the two output layers have different activation functions, softmax for classification (for each class, you get the probability) and one sigmoid for regression.
At the end, just start the training with:
model.fit(dataset_train.batch(1), epochs=20)
For sure this is not the best way of doing it, but it proves, that is is possible.

Related

usage of GRU in non sequential context

I was implementing a GRU in keras, I was still a bit confused about some things, but got to a model:
modelGRU = tf.keras.models.Sequential()
modelGRU.add(layers.Bidirectional(tf.keras.layers.GRU(50, activation='tanh', input_shape=(1, 4))))
modelGRU.add(layers.Dense(99))
Then I've found out that my model does not make any sense, since I put the model parameters (which are 4 parameters like depth, angle, ..., which are the same at all times) in a single GRU. This gives me an output of dimensions 100 (50*2), and then a dense layer is used to generate the 99 outputs. These 99 outputs are a timeseries and that is why I initially taught of GRU, but of course this implementation above is not right, since my model parameters have no time or sequential information. However, this model seems to be working better than the model I have implemented once I understood everything better:
params_input = keras.Input(shape=(99,4))
aantal_units = 5
naRNN = (tf.keras.layers.GRU(aantal_units,input_shape=(99,5),return_sequences=True))(params_input)
ylist = tf.unstack(naRNN,num=99,axis=1)
ylistdense = []
for ii in range(0,99):
yy = tf.keras.layers.Dense(1,activation='linear')(ylist[ii])
ylistdense.append(yy)
conc = tf.keras.layers.concatenate(ylistdense)
model = keras.Model(inputs=params_input,outputs=conc)
Here for my input I copied 99 times the model parameters, in order to have an input of shape (99,4), put these in an GRU layer, and then for every timestep individually I make a dense layer in order to predict the outcome.
Her the architecture of my second implementation is visualized
So my question is: can a GRU be used for non sequential input? or is there something wrong with my second implementation?

Specifying class or sample weights in Keras for one-hot encoded labels in a TF Dataset

I am trying to train an image classifier on an unbalanced training set. In order to cope with the class imbalance, I want either to weight the classes or the individual samples. Weighting the classes does not seem to work. And somehow for my setup I was not able to find a way to specify the samples weights. Below you can read how I load and encode the training data and the two approaches that I tried.
Training data loading and encoding
My training data is stored in a directory structure where each image is place in the subfolder corresponding to its class (I have 32 classes in total). Since the training data is too big too all load at once into memory I make use of image_dataset_from_directory and by that describe the data in a TF Dataset:
train_ds = keras.preprocessing.image_dataset_from_directory (training_data_dir,
batch_size=batch_size,
image_size=img_size,
label_mode='categorical')
I use label_mode 'categorical', so that the labels are described as a one-hot encoded vector.
I then prefetch the data:
train_ds = train_ds.prefetch(buffer_size=buffer_size)
Approach 1: specifying class weights
In this approach I try to specify the class weights of the classes via the class_weight argument of fit:
model.fit(
train_ds, epochs=epochs, callbacks=callbacks, validation_data=val_ds,
class_weight=class_weights
)
For each class we compute weight which are inversely proportional to the number of training samples for that class. This is done as follows (this is done before the train_ds.prefetch() call described above):
class_num_training_samples = {}
for f in train_ds.file_paths:
class_name = f.split('/')[-2]
if class_name in class_num_training_samples:
class_num_training_samples[class_name] += 1
else:
class_num_training_samples[class_name] = 1
max_class_samples = max(class_num_training_samples.values())
class_weights = {}
for i in range(0, len(train_ds.class_names)):
class_weights[i] = max_class_samples/class_num_training_samples[train_ds.class_names[i]]
What I am not sure about is whether this solution works, because the keras documentation does not specify the keys for the class_weights dictionary in case the labels are one-hot encoded.
I tried training the network this way but found out that the weights did not have a real influence on the resulting network: when I looked at the distribution of predicted classes for each individual class then I could recognize the distribution of the overall training set, where for each class the prediction of the dominant classes is most likely.
Running the same training without any class weight specified led to similar results.
So I suspect that the weights don't seem to have an influence in my case.
Is this because specifying class weights does not work for one-hot encoded labels, or is this because I am probably doing something else wrong (in the code I did not show here)?
Approach 2: specifying sample weight
As an attempt to come up with a different (in my opinion less elegant) solution I wanted to specify the individual sample weights via the sample_weight argument of the fit method. However from the documentation I find:
[...] This argument is not supported when x is a dataset, generator, or keras.utils.Sequence instance, instead provide the sample_weights as the third element of x.
Which is indeed the case in my setup where train_ds is a dataset. Now I really having trouble finding documentation from which I can derive how I can modify train_ds, such that it has a third element with the weight. I thought using the map method of a dataset can be useful, but the solution I came up with is apparently not valid:
train_ds = train_ds.map(lambda img, label: (img, label, class_weights[np.argmax(label)]))
Does anyone have a solution that may work in combination with a dataset loaded by image_dataset_from_directory?

Using keras.layers.Concatenate correctly [duplicate]

I am trying to merge output from two models and give them as input to the third model using keras sequential model.
Model1 :
inputs1 = Input(shape=(750,))
x = Dense(500, activation='relu')(inputs1)
x = Dense(100, activation='relu')(x)
Model1 :
inputs2 = Input(shape=(750,))
y = Dense(500, activation='relu')(inputs2)
y = Dense(100, activation='relu')(y)
Model3 :
merged = Concatenate([x, y])
final_model = Sequential()
final_model.add(merged)
final_model.add(Dense(100, activation='relu'))
final_model.add(Dense(3, activation='softmax'))
Till here, my understanding is that, output from two models as x and y are merged and given as input to the third model. But when I fit this all like,
module3.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
module3.fit([in1, in2], np_res_array)
in1 and in2 are two numpy ndarray of dimention 10000*750 which contains my training data and np_res_array is the corresponding target. This gives me error as 'list' object has no attribute 'shape' As far as know, this is how we give multiple inputs to a model, but what is this error? How do I resolve it?
You can't do this using Sequential API. That's because of two reasons:
Sequential models, as their name suggests, are a sequence of layers where each layer is connected directly to its previous layer and therefore they cannot have branches (e.g. merge layers, multiple input/output layers, skip connections, etc.).
The add() method of Sequential API accepts a Layer instance as its argument and not a Tensor instance. In your example merged is a Tensor (i.e. concatenation layer's output).
Further, the correct way of using Concatenate layer is like this:
merged = Concatenate()([x, y])
However, you can also use concatenate (note the lowercase "c"), its equivalent functional interface, like this:
merged = concatenate([x, y])
Finally, to be able to construct that third model you also need to use the functional API.

How to tie word embedding and softmax weights in keras?

Its commonplace for various neural network architectures in NLP and vision-language problems to tie the weights of an initial word embedding layer to that of an output softmax. Usually this produces a boost to sentence generation quality. (see example here)
In Keras its typical to embed word embedding layers using the Embedding class, however there seems to be no easy way to tie the weights of this layer to the output softmax. Would anyone happen to know how this could be implemented ?
Be aware that Press and Wolf dont't propose to freeze the weights to some pretrained ones, but tie them. That means, to ensure that input and output weights are always the same during training (in the sense of synchronized).
In a typical NLP model (e.g. language modelling/translation), you have an input dimension (vocabulary) of size V and a hidden representation size H. Then, you start with an Embedding layer, which is a matrix VxH. And the output layer is (probably) something like Dense(V, activation='softmax'), which is a matrix H2xV. When tying the weights, we want that those matrices are the same (therefore, H==H2).
For doing this in Keras, I think the way to go is via shared layers:
In your model, you need to instantiate a shared embedding layer (of dimension VxH), and apply it to either your input and output. But you need to transpose it, to have the desired output dimensions (HxV). So, we declare a TiedEmbeddingsTransposed layer, which transposes the embedding matrix from a given layer (and applies an activation function):
class TiedEmbeddingsTransposed(Layer):
"""Layer for tying embeddings in an output layer.
A regular embedding layer has the shape: V x H (V: size of the vocabulary. H: size of the projected space).
In this layer, we'll go: H x V.
With the same weights than the regular embedding.
In addition, it may have an activation.
# References
- [ Using the Output Embedding to Improve Language Models](https://arxiv.org/abs/1608.05859)
"""
def __init__(self, tied_to=None,
activation=None,
**kwargs):
super(TiedEmbeddingsTransposed, self).__init__(**kwargs)
self.tied_to = tied_to
self.activation = activations.get(activation)
def build(self, input_shape):
self.transposed_weights = K.transpose(self.tied_to.weights[0])
self.built = True
def compute_mask(self, inputs, mask=None):
return mask
def compute_output_shape(self, input_shape):
return input_shape[0], K.int_shape(self.tied_to.weights[0])[0]
def call(self, inputs, mask=None):
output = K.dot(inputs, self.transposed_weights)
if self.activation is not None:
output = self.activation(output)
return output
def get_config(self):
config = {'activation': activations.serialize(self.activation)
}
base_config = super(TiedEmbeddingsTransposed, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
The usage of this layer is:
# Declare the shared embedding layer
shared_embedding_layer = Embedding(V, H)
# Obtain word embeddings
word_embedding = shared_embedding_layer(input)
# Do stuff with your model
# Compute output (e.g. a vocabulary-size probability vector) with the shared layer:
output = TimeDistributed(TiedEmbeddingsTransposed(tied_to=shared_embedding_layer, activation='softmax')(intermediate_rep)
I have tested this in NMT-Keras and it trains properly. But, as I try to load a trained model, it gets an error, related to the way Keras loads the models: it doesn't load the weights from the tied_to. I've found several questions regarding this (1, 2, 3), but I haven't managed to solve this issue. If someone have any ideas on the next steps to take, I'd be very glad to hear them :)
As you may read here you should simply set trainable flag to False. E.g.
aux_output = Embedding(..., trainable=False)(input)
....
output = Dense(nb_of_classes, .. ,activation='softmax', trainable=False)

How to handle gradients when training two sub-graphs simultaneously

The general idea I am trying to realize is a seq2seq-model (taken from the translate.py-example in the models, based on the seq2seq-class). This trains well.
Furthermore I am using the hidden state of the rnn after all the encoding is done, right before decoding starts (I call it the “hidden state at end of encoding”). I use this hidden state at end of encoding to feed it into a further sub-graph which I call “prices” (see below). The training gradients of this sub-graph backprop not only through this additional sub-graph, but also back into the encoder-part of the rnn (which is what I want and need).
The plan is to add more such sub-graph to the hidden state at end of encoding, as I want to analyze the input phrases in a variety of ways.
Now during training when I evaluate and train both sub-graphs (encoder+prices AND encoder+decoder) at the same time, the net does NOT converge. However, if I train by executing the training in the following way (pseudo-code):
if global_step % 10 == 0:
execute-the-price-training_code
else:
execute-the-decoder-training_code
So I am not training both sub-graphs simultaneously. Now it does converge, but the encoder+decoder-part converges MUCH slower than if I ONLY train this part and never train the prices-sub-graph.
My question is: I should be able to train both sub-graphs simultaneously. But probably I have to rescale the gradients flowing back into the hidden state at end of encoding. Here we get the gradients from the prices sub-graph AND from the decoder-sub-graph. How should this rescaling be done. I didnt find any papers describing such an undertaking, but maybe I am searching with the wrong keywords.
Here is the training-part of the code:
This is the (almost original) training-op-preparation:
if not forward_only:
self.gradient_norms = []
self.updates = []
opt = tf.train.AdadeltaOptimizer(self.learning_rate)
for bucket_id in xrange(len(buckets)):
tf.scalar_summary("seq2seq loss", self.losses[bucket_id])
gradients = tf.gradients(self.losses[bucket_id], var_list_seq2seq)
clipped_gradients, norm = tf.clip_by_global_norm(gradients, max_gradient_norm)
self.gradient_norms.append(norm)
self.updates.append(opt.apply_gradients(zip(clipped_gradients, var_list_seq2seq), global_step=self.global_step))
Now, additionally, I am running a second sub-graph that takes the hidden state at end of encoding as input:
with tf.name_scope('prices') as scope:
#First layer
W_price_first_layer = tf.Variable(tf.random_normal([num_layers*size, self.prices_hidden_layer_size], stddev=0.35), name="W_price_first_layer")
B_price_first_layer = tf.Variable(tf.zeros([self.prices_hidden_layer_size]), name="B_price_first_layer")
self.output_price_first_layer = tf.add(tf.matmul(self.hidden_state, W_price_first_layer), B_price_first_layer)
self.activation_price_first_layer = tf.nn.sigmoid(self.output_price_first_layer)
#self.activation_price_first_layer = tf.nn.Relu(self.output_price_first_layer)
#Second layer to softmax (price ranges)
W_price = tf.Variable(tf.random_normal([self.prices_hidden_layer_size, self.prices_bit_size], stddev=0.35), name="W_price")
W_price_t = tf.transpose(W_price)
B_price = tf.Variable(tf.zeros([self.prices_bit_size]), name="B_price")
self.output_price_second_layer = tf.add(tf.matmul(self.activation_price_first_layer, W_price),B_price)
self.price_prediction = tf.nn.softmax(self.output_price_second_layer)
self.label_price = tf.placeholder(tf.int32, shape=[self.batch_size], name="price_label")
#Remember the prices trainables
var_list_prices = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "prices")
var_list_all = tf.trainable_variables()
#Backprop
self.loss_price = tf.nn.sparse_softmax_cross_entropy_with_logits(self.output_price_second_layer, self.label_price)
self.loss_price_scalar = tf.reduce_mean(self.loss_price)
self.optimizer_price = tf.train.AdadeltaOptimizer(self.learning_rate_prices)
self.training_op_price = self.optimizer_price.minimize(self.loss_price, var_list=var_list_all)
Thx a bunch
I expect that running two optimizers simultaneously will lead to inconsistent gradient updates on the common variables, and this might be causing your training not to converge.
Instead, if you add the scalar loss from each sub-network to the "losses collection" (e.g. via tf.contrib.losses.add_loss() or tf.add_to_collection(tf.GraphKeys.LOSSES, ...), you can use tf.contrib.losses.get_total_loss() to get a single loss value that can be passed to a single standard TensorFlow tf.train.Optimizer subclass. TensorFlow will derive the appropriate back-prop computation for your split network.
The get_total_loss() method simply computes an unweighted sum of the values that have been added to the losses collection. I'm not familiar with the literature on how or if you should scale these values, but you can use any arbitrary (differentiable) TensorFlow expression to combine the losses and pass the result to a single optimizer.

Resources