I have a function that takes an (image , hidden state) as an input, and feedback the output back to the input, This goes for the length of the sequence and the last output is of interest to me :-
The recurrence function is as below:-
def recurrence():
input_image = Input(shape=[dim1,dim2,dim3])
hidden_state= Input(shape=[4,4,4,128])
conv1a = Conv2D(filters=96, kernel_size=7, padding='same',input_shape=(dim1, dim2,dim3), kernel_initializer=k_init, bias_initializer=b_init)(input_image)
conv1a = LeakyReLU(0.01)(conv1a)
conv1b = Conv2D(filters=96, kernel_size=3, padding='same', kernel_initializer=k_init, bias_initializer=b_init)(conv1a)
conv1b = LeakyReLU(0.01)(conv1b)
conv1b = ZeroPadding2D(padding=(1, 1))(conv1b)
pool1 = MaxPooling2D(2)(conv1b)
flat6 = Flatten()(pool1)
fc7 = Dense(units=1024, kernel_initializer=k_init, bias_initializer=b_init)(flat6)
rect7 = LeakyReLU(0.01)(fc7)
t_x_s_update_conv = Conv3D(128, 3, activation=None, padding='same', kernel_initializer=k_init, use_bias=False)(hidden_state)
t_x_s_update_dense =Reshape((4,4,4,128))(Dense(units=8192)(rect7))
t_x_s_update = layers.add([t_x_s_update_conv, t_x_s_update_dense])
t_x_s_reset_conv = Conv3D(128, 3, activation=None, padding='same', kernel_initializer=k_init, use_bias=False)(hidden_state)
t_x_s_reset_dense =Reshape((4,4,4,128))(Dense(units=8192)(rect7))
t_x_s_reset =layers.add([t_x_s_reset_conv, t_x_s_reset_dense])
update_gate = Activation(K.sigmoid)(t_x_s_update)
comp_update_gate = Lambda(lambda x: 1 - x)(update_gate)
reset_gate = Activation(K.sigmoid)(t_x_s_reset)
rs = layers.multiply([reset_gate, hidden_state])
t_x_rs_conv = Conv3D(128, 3, activation=None, padding='same', kernel_initializer=k_init, use_bias=False)(rs)
t_x_rs_dense = Reshape((4,4,4,128))(Dense(units=8192)(rect7))
t_x_rs = layers.add([t_x_rs_conv, t_x_rs_dense])
tanh_t_x_rs = Activation(K.tanh)(t_x_rs)
gru_out = layers.add([layers.multiply([update_gate, hidden_state]), layers.multiply([comp_update_gate, tanh_t_x_rs])])
return Model(inputs=[input_image,hidden_state],outputs=gru_out)
Currently i am achieving this using a loop and instantiating a model once and then reusing it like below:-
step=recurrence()
s_update = hidden_init
for i in input_imgs:
s_update = step([i, s_update])`
But this method seems to be suffer from the following two disadvantages:
i) It consumes a lot of GPU memory considering all variables are shared.
ii) It doesn't work for an arbitrary length sequence without padding.
Is there a better and efficient way to achieve this , I tried reading the code for SimpleRNN in the recurrent.py file , but couldn't understand how would i incorporate all these layers like Convolution3d into that framework?
A dummy code would be really appreciated or any help you could provide. Thanks !
Related
I wrote a script using xgboost to predict soil class for a certain area using data from field and satellite images. The script as below:
`
rm(list=ls())
library(xgboost)
library(caret)
library(raster)
library(sp)
library(rgeos)
library(ggplot2)
setwd("G:/DATA")
data <- read.csv('96PointsClay02finalone.csv')
head(data)
summary(data)
dim(data)
ras <- stack("Allindices04TIFF.tif")
names(ras) <- c("b1", "b2", "b3", "b4", "b5", "b6", "b7", "b10", "b11","DEM",
"R1011", "SCI", "SAVI", "NDVI", "NDSI", "NDSandI", "MBSI",
"GSI", "GSAVI", "EVI", "DryBSI", "BIL", "BI","SRCI")
set.seed(27) # set seed for generating random data.
# createDataPartition() function from the caret package to split the original dataset into a training and testing set and split data into training (80%) and testing set (20%)
parts = createDataPartition(data$Clay, p = .8, list = F)
train = data[parts, ]
test = data[-parts, ]
#define predictor and response variables in training set
train_x = data.matrix(train[, -1])
train_y = train[,1]
#define predictor and response variables in testing set
test_x = data.matrix(test[, -1])
test_y = test[, 1]
#define final training and testing sets
xgb_train = xgb.DMatrix(data = train_x, label = train_y)
xgb_test = xgb.DMatrix(data = test_x, label = test_y)
#defining a watchlist
watchlist = list(train=xgb_train, test=xgb_test)
#fit XGBoost model and display training and testing data at each iteartion
model = xgb.train(data = xgb_train, max.depth = 3, watchlist=watchlist, nrounds = 100)
#define final model
model_xgboost = xgboost(data = xgb_train, max.depth = 3, nrounds = 86, verbose = 0)
summary(model_xgboost)
#use model to make predictions on test data
pred_y = predict(model_xgboost, xgb_test)
# performance metrics on the test data
mean((test_y - pred_y)^2) #mse - Mean Squared Error
caret::RMSE(test_y, pred_y) #rmse - Root Mean Squared Error
y_test_mean = mean(test_y)
rmseE<- function(error)
{
sqrt(mean(error^2))
}
y = test_y
yhat = pred_y
rmseresult=rmseE(y-yhat)
(r2 = R2(yhat , y, form = "traditional"))
cat('The R-square of the test data is ', round(r2,4), ' and the RMSE is ', round(rmseresult,4), '\n')
#use model to make predictions on satellite image
result <- predict(model_xgboost, ras[1:(nrow(ras)*ncol(ras))])
#create a result raster
res <- raster(ras)
#fill in results and add a "1" to them (to get back to initial class numbering! - see above "Prepare data" for more information)
res <- setValues(res,result+1)
#Save the output .tif file into saved directory
writeRaster(res, "xgbmodel_output", format = "GTiff", overwrite=T)
`
The script works well till it reachs
result <- predict(model_xgboost, ras[1:(nrow(ras)*ncol(ras))])
it takes some time then gives this error:
Error in predict.xgb.Booster(model_xgboost, ras[1:(nrow(ras) * ncol(ras))]) :
Feature names stored in `object` and `newdata` are different!
I realize that I am doing something wrong in that line. However, I do not know how to apply the xgboost model to a raster image that represents my study area.
It would be highly appreciated if someone give a hand, enlightened me, and helped me solve this problem....
My data as csv and raster image can be found here.
Finally, I got the reason for this error.
It was my mistake as the number of columns in the traning data was not the same as in the number of layers in the satellite image.
I have a class imbalance issue with my data, and following the guide here, I'm trying to downsample during a cv. I'd like to be able to inspect individual folds after running the model, but when I attempt to pull an individual fold from the training set I don't see balanced classes like I expected. Does the downsampling occur after the creation of the fold? If so, how would I retrieve those indices? A sample:
set.seed(2969)
imbal_train <- twoClassSim(10000, intercept = -20, linearVars = 20)
tr_ctrl <- trainControl(method = "cv", number = 5,
classProbs = TRUE,
p=0.5,
summaryFunction = twoClassSummary,
sampling = "down")
testModel<-train(Class ~ .,
data = imbal_train,
method = "rf",
metric = "ROC",
trControl = tr_ctrl)
fold1<-imbal_train[testModel$control$index$Fold1,]
table(fold1$Class)
Class1 7529 Class2 472
I'm trying to create a Gaussian HMM model in pyro to infer the parameters of a very simple Markov sequence. However, my model fails to infer the parameters and something wired happened during the training process. Using the same sequence, hmmlearn has successfully infer the true parameters.
Full code can be accessed in here:
https://colab.research.google.com/drive/1u_4J-dg9Y1CDLwByJ6FL4oMWMFUVnVNd#scrollTo=ZJ4PzdTUBgJi
My model is modified from the example in here:
https://github.com/pyro-ppl/pyro/blob/dev/examples/hmm.py
I manually created a first order Markov sequence where there are 3 states, the true means are [-10, 0, 10], sigmas are [1,2,1].
Here is my model
def model(observations, num_state):
assert not torch._C._get_tracing_state()
with poutine.mask(mask = True):
p_transition = pyro.sample("p_transition",
dist.Dirichlet((1 / num_state) * torch.ones(num_state, num_state)).to_event(1))
p_init = pyro.sample("p_init",
dist.Dirichlet((1 / num_state) * torch.ones(num_state)))
p_mu = pyro.param(name = "p_mu",
init_tensor = torch.randn(num_state),
constraint = constraints.real)
p_tau = pyro.param(name = "p_tau",
init_tensor = torch.ones(num_state),
constraint = constraints.positive)
current_state = pyro.sample("x_0",
dist.Categorical(p_init),
infer = {"enumerate" : "parallel"})
for t in pyro.markov(range(1, len(observations))):
current_state = pyro.sample("x_{}".format(t),
dist.Categorical(Vindex(p_transition)[current_state, :]),
infer = {"enumerate" : "parallel"})
pyro.sample("y_{}".format(t),
dist.Normal(Vindex(p_mu)[current_state], Vindex(p_tau)[current_state]),
obs = observations[t])
My model is compiled as
device = torch.device("cuda:0")
obs = torch.tensor(obs)
obs = obs.to(device)
torch.set_default_tensor_type("torch.cuda.FloatTensor")
guide = AutoDelta(poutine.block(model, expose_fn = lambda msg : msg["name"].startswith("p_")))
Elbo = Trace_ELBO
elbo = Elbo(max_plate_nesting = 1)
optim = Adam({"lr": 0.001})
svi = SVI(model, guide, optim, elbo)
As the training goes, the ELBO has decreased steadily as shown. However, the three means of the states converges.
I have tried to put the for loop of my model into a pyro.plate and switch pyro.param to pyro.sample and vice versa, but nothing worked for my model.
I have not tried this model, but I think it should be possible to solve the problem by modifying the model in the following way:
def model(observations, num_state):
assert not torch._C._get_tracing_state()
with poutine.mask(mask = True):
p_transition = pyro.sample("p_transition",
dist.Dirichlet((1 / num_state) * torch.ones(num_state, num_state)).to_event(1))
p_init = pyro.sample("p_init",
dist.Dirichlet((1 / num_state) * torch.ones(num_state)))
p_mu = pyro.sample("p_mu",
dist.Normal(torch.zeros(num_state), torch.ones(num_state)).to_event(1))
p_tau = pyro.sample("p_tau",
dist.HalfCauchy(torch.zeros(num_state)).to_event(1))
current_state = pyro.sample("x_0",
dist.Categorical(p_init),
infer = {"enumerate" : "parallel"})
for t in pyro.markov(range(1, len(observations))):
current_state = pyro.sample("x_{}".format(t),
dist.Categorical(Vindex(p_transition)[current_state, :]),
infer = {"enumerate" : "parallel"})
pyro.sample("y_{}".format(t),
dist.Normal(Vindex(p_mu)[current_state], Vindex(p_tau)[current_state]),
obs = observations[t])
The model would then be trained using MCMC:
# MCMC
hmc_kernel = NUTS(model, target_accept_prob = 0.9, max_tree_depth = 7)
mcmc = MCMC(hmc_kernel, num_samples = 1000, warmup_steps = 100, num_chains = 1)
mcmc.run(obs)
The results could then be analysed using:
mcmc.get_samples()
What I am trying to do:
I want to connect any Layers from different models to create a new keras model.
What I found so far:
https://github.com/keras-team/keras/issues/4205: using the Model's call class to change the input of another model. My problems with this approach:
Can only change the input of the Model, no other layers. So if I want to cut off some layers at the beginning of the encoder, that is not possible
Not a fan of the nested array structure when getting the config file. Would prefer to have a 1D-array
When using model.summary() or plot_model(), the encoder only shows as "Model". If anything I would say both models should be wrapped. So the config should show [model_base, model_encoder] and not [base_input, base_conv2D, ..., encoder_model]
To be fair, with this approach: https://github.com/keras-team/keras/issues/3021, the point above is actually possible, but again, it is very inflexible. As soon as I want to cut off some layers at the top or bottom of the base or encoder network, this approach fails
https://github.com/keras-team/keras/issues/3465: Adding new layers to a base model by using any output of the base model. Problems here:
While it is possible to use any layer from the base model, which means I can cut off layers from the base model, I can not load the encoder as a keras model. The top models always must be created new.
What I have tried:
My approach to connecting any layers from different models:
Clear inbound nodes of input layer
use the call() method of the output layer with the tensor of the output layer
Clean up the outbound nodes of the output tensor by switching out the new created tensor with the previous output tensor
I was really optimistic at first, as the summary() and the plot_model() got me exactly what I wanted, thus the Node graph should be fine, right? But I ran into errors when training. While the approach in the "What I found so far" section trained fine, I ran into an error with my approach. This is the error message:
File "C:\Anaconda\envs\dlpipe\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 508, in apply_op
(input_name, err))
ValueError: Tried to convert 'x' to a tensor and failed. Error: None values not supported.
Might be an important info, that I am using Tensorflow as backend. I was able to trace back the root of this error. It seems like there is an error when the gradients are calculated. Usually, there is a gradient calculation for each node, but all the nodes of the base network have "None" when using my approach. So basically in keras/optimizers.py, get_updates() when the gradients are calculated (grad = self.get_gradients(loss, params)).
Here is the code (without the training), with all three approaches implemented:
def create_base():
in_layer = Input(shape=(32, 32, 3), name="base_input")
x = Conv2D(32, (3, 3), padding='same', activation="relu", name="base_conv2d_1")(in_layer)
x = Conv2D(32, (3, 3), padding='same', activation="relu", name="base_conv2d_2")(x)
x = MaxPooling2D(pool_size=(2, 2), name="base_maxpooling_2d_1")(x)
x = Dropout(0.25, name="base_dropout")(x)
x = Conv2D(64, (3, 3), padding='same', activation="relu", name="base_conv2d_3")(x)
x = Conv2D(64, (3, 3), padding='same', activation="relu", name="base_conv2d_4")(x)
x = MaxPooling2D(pool_size=(2, 2), name="base_maxpooling2d_2")(x)
x = Dropout(0.25, name="base_dropout_2")(x)
return Model(inputs=in_layer, outputs=x, name="base_model")
def create_encoder():
in_layer = Input(shape=(8, 8, 64))
x = Flatten(name="encoder_flatten")(in_layer)
x = Dense(512, activation="relu", name="encoder_dense_1")(x)
x = Dropout(0.5, name="encoder_dropout_2")(x)
x = Dense(10, activation="softmax", name="encoder_dense_2")(x)
return Model(inputs=in_layer, outputs=x, name="encoder_model")
def extend_base(input_model):
x = Flatten(name="custom_flatten")(input_model.output)
x = Dense(512, activation="relu", name="custom_dense_1")(x)
x = Dropout(0.5, name="custom_dropout_2")(x)
x = Dense(10, activation="softmax", name="custom_dense_2")(x)
return Model(inputs=input_model.input, outputs=x, name="custom_edit")
def connect_layers(from_tensor, to_layer, clear_inbound_nodes=True):
try:
tmp_output = to_layer.output
except AttributeError:
raise ValueError("Connecting to shared layers is not supported!")
if clear_inbound_nodes:
to_layer.inbound_nodes = []
else:
tensor_list = to_layer.inbound_nodes[0].input_tensors
tensor_list.append(from_tensor)
from_tensor = tensor_list
to_layer.inbound_nodes = []
new_output = to_layer(from_tensor)
for out_node in to_layer.outbound_nodes:
for i, in_tensor in enumerate(out_node.input_tensors):
if in_tensor == tmp_output:
out_node.input_tensors[i] = new_output
if __name__ == "__main__":
base = create_base()
encoder = create_encoder()
#new_model_1 = Model(inputs=base.input, outputs=encoder(base.output))
#plot_model(new_model_1, to_file="plots/new_model_1.png")
new_model_2 = extend_base(base)
plot_model(new_model_2, to_file="plots/new_model_2.png")
print(new_model_2.summary())
base_layer = base.get_layer("base_dropout_2")
top_layer = encoder.get_layer("encoder_flatten")
connect_layers(base_layer.output, top_layer)
new_model_3 = Model(inputs=base.input, outputs=encoder.output)
plot_model(new_model_3, to_file="plots/new_model_3.png")
print(new_model_3.summary())
I know this is a lot of text and a lot of code. But I feel like it is needed to explain the issue here.
EDIT: I just tried thenao and I think the error gives away more information:
theano.gradient.DisconnectedInputError:
Backtrace when that variable is created:
It seems like every layer from the encoder model has some connection with the encoder input layer via TensorVariables.
So this is what I ended up with for the connect_layer() function:
def connect_layers(from_tensor, to_layer, old_tensor=None):
# if there is any shared layer after the to_layer, it is not supported
try:
tmp_output = to_layer.output
except AttributeError:
raise ValueError("Connecting to shared layers is not supported!")
# check if to_layer has multiple input_tensors, and therefore some sort of merge layer
if len(to_layer.inbound_nodes[0].input_tensors) > 1:
tensor_list = to_layer.inbound_nodes[0].input_tensors
found_tensor = False
for i, tensor in enumerate(tensor_list):
# exchange the old tensor with the new created tensor
if tensor == old_tensor:
tensor_list[i] = from_tensor
found_tensor = True
break
if not found_tensor:
tensor_list.append(from_tensor)
from_tensor = tensor_list
to_layer.inbound_nodes = []
else:
to_layer.inbound_nodes = []
new_output = to_layer(from_tensor)
tmp_out_nodes = to_layer.outbound_nodes[:]
to_layer.outbound_nodes = []
# recursively connect all layers after the current to_layer
for out_node in tmp_out_nodes:
l = out_node.outbound_layer
print("Connecting: " + str(to_layer) + " ----> " + str(l))
connect_layers(new_output, l, tmp_output)
As each Tensor has all the information about it's root tensor via -> owner.inputs -> owner.inputs -> ..., all tensor following the new_output tensor must be updated.
It was a lot easier to debug that with theano then with tensorflow backend.
I still need to figure out how to deal with shared layers. With the current implementation it is not possible to connect other models that contain a shared layer after the first to_layer.
I have a .tfrecords file filled with labeled data. I'd like to use X% of them for training and (1-X)% for evaluation/testing. Obviously there shouldn't be any overlap. What is the best way of doing this?
Below is my small block of code for reading tfrecords. Is there some way I can get shuffle_batch to split the data into training and evaluation data? Am I going about this incorrectly?
reader = tf.TFRecordReader()
files = tf.train.string_input_producer([TFRECORDS_FILE], num_epochs=num_epochs)
read_name, serialized_examples = reader.read(files)
features = tf.parse_single_example(
serialized = serialized_examples,
features={
'image': tf.FixedLenFeature([], tf.string),
'value': tf.FixedLenFeature([], tf.string),
})
image = tf.decode_raw(features['image'], tf.uint8)
value = tf.decode_raw(features['value'], tf.uint8)
image, value = tf.train.shuffle_batch([image, value],
enqueue_many = False,
batch_size = 4,
capacity = 30,
num_threads = 3,
min_after_dequeue = 10)
Although the question was asked over a year ago, I had a similar question recently.
I used tf.data.Dataset with filters on input hash. Here is a sample:
dataset = tf.data.TFRecordDataset(files)
if is_evaluation:
dataset = dataset.filter(
lambda r: tf.string_to_hash_bucket_fast(r, 10) == 0)
else:
dataset = dataset.filter(
lambda r: tf.string_to_hash_bucket_fast(r, 10) != 0)
dataset = dataset.map(tf.parse_single_example)
return dataset
One of the downsides that I have noticed so far that each evaluation may require data traversal 10x to collect enough data. To avoid this you may want to separate data at the data preprocessing time.