MxNet has trouble saving all parameters of a network - machine-learning

In my experiment, the MxNet may forget saving some parameters of my network.
I am studying mxnet’s gluoncv package (https://gluon-cv.mxnet.io/index.html). To learn the programming skills from the engineers, I manually generate an SSD with ‘gluoncv.model_zoo.ssd.SSD’. The parameters that I use to initialize this class are the same as the official ‘ssd_512_resnet50_v1_voc’ network except ‘classes=('car', 'pedestrian', 'truck', 'trafficLight', 'biker')’.
from gluoncv.model_zoo.ssd import SSD
import mxnet as mx
name = 'resnet50_v1'
base_size = 512
features=['stage3_activation5', 'stage4_activation2']
filters=[512, 512, 256, 256]
sizes=[51.2, 102.4, 189.4, 276.4, 363.52, 450.6, 492]
ratios=[[1, 2, 0.5]] + [[1, 2, 0.5, 3, 1.0/3]] * 3 + [[1, 2, 0.5]] * 2
steps=[16, 32, 64, 128, 256, 512]
classes=('car', 'pedestrian', 'truck', 'trafficLight', 'biker')
pretrained=True
net = SSD(network = name, base_size = base_size, features = features,
num_filters = filters, sizes = sizes, ratios = ratios, steps = steps,
pretrained=pretrained, classes=classes)
I try to feed a manmade data x to this network, and it gives following errors.
x = mx.nd.zeros(shape=(batch_size,3,base_size,base_size))
cls_preds, box_preds, anchors = net(x)
RuntimeError: Parameter 'ssd0_expand_trans_conv0_weight' has not been initialized. Note that you should initialize parameters and create Trainer with Block.collect_params() instead of Block.params because the later does not include Parameters of nested child Blocks
This is reasonable. The SSD uses function ‘gluoncv.nn.feature.FeatureExpander’ to add new layers on the '_resnet50_v1_', and I forget to initialize them. So, I use following codes.
net.initialize()
Oho, it gives me a lot of warnings.
v.initialize(None, ctx, init, force_reinit=force_reinit)
C:\Users\Bird\AppData\Local\conda\conda\envs\ssd\lib\site-packages\mxnet\gluon\parameter.py:687: UserWarning: Parameter 'ssd0_resnetv10_stage4_batchnorm9_running_mean' is already initialized, ignoring. Set force_reinit=True to re-initialize.
v.initialize(None, ctx, init, force_reinit=force_reinit)
C:\Users\Bird\AppData\Local\conda\conda\envs\ssd\lib\site-packages\mxnet\gluon\parameter.py:687: UserWarning: Parameter 'ssd0_resnetv10_stage4_batchnorm9_running_var' is already initialized, ignoring. Set force_reinit=True to re-initialize.
v.initialize(None, ctx, init, force_reinit=force_reinit)
The '_resnet50_v1_' which is the base of SSD are pre-trained, so these parameters cannot be installed. However, these warnings are annoying.
How can I turn them off?
Here, though, comes the first problem. I would like to save the parameters of the network.
net.save_params('F:/Temps/Models_tmp/' +'myssd.params')
The parameter file of _'resnet50_v1_' (‘resnet50_v1-c940b1a0.params’) is 97.7MB; however, my parameter file is only 9.96MB. Are there some magical technologies to compress these parameters?
To test this new technology, I open a new console and rebuild the same network. Then, I load the saved parameters and feed a data to it.
net.load_params('F:/Temps/Models_tmp/' +'myssd.params')
x = mx.nd.zeros(shape=(batch_size,3,base_size,base_size))
The initialization error comes again.
RuntimeError: Parameter 'ssd0_expand_trans_conv0_weight' has not been initialized. Note that you should initialize parameters and create Trainer with Block.collect_params() instead of Block.params because the later does not include Parameters of nested child Blocks
This cannot be right because the saved file 'myssd.params' should contain all the installed parameters of my network.
To find the block ‘_ssd0_expand_trans_conv0’, I do a deeper research in ‘gluoncv.nn.feature. FeatureExpander_’. I use ‘mxnet.gluon. nn.Conv2D’ to replace ‘mx.sym.Convolution’ in the ‘FeatureExpander’ function.
'''
y = mx.sym.Convolution(
y, num_filter=num_trans, kernel=(1, 1), no_bias=use_bn,
name='expand_trans_conv{}'.format(i), attr={'__init__': weight_init})
'''
Conv1 = nn.Conv2D(channels = num_trans,kernel_size = (1, 1),use_bias = use_bn,weight_initializer = weight_init)
y = Conv1(y)
Conv1.initialize(verbose = True)
'''
y = mx.sym.Convolution(
y, num_filter=f, kernel=(3, 3), pad=(1, 1), stride=(2, 2),
no_bias=use_bn, name='expand_conv{}'.format(i), attr={'__init__': weight_init})
'''
Conv2 = nn.Conv2D(channels = f,kernel_size = (3, 3),padding = (1, 1),strides = (2, 2),use_bias = use_bn, weight_initializer = weight_init)
y = Conv2(y)
Conv2.initialize(verbose = True)
These new blocks can be initialized manually. However, the MxNet still report the same errors.
It seems that the manual initialization is of no effect.
How can I save all the parameters of my network and restore them?

There is a tutorial on the subject of saving and loading that may be of help:
https://mxnet.apache.org/versions/1.6/api/python/docs/tutorials/packages/gluon/blocks/save_load_params.html

Related

Rewriting ParamSet ids from mlr3::paradox()

Let's say I have the following ParamSet object:
my_ps = paradox::ps(
minsplit = p_int(1, 64, logscale = TRUE),
cp = p_dbl(1e-04, 1, logscale = TRUE))
Is it possible to rename minsplit to survTree.minsplit without changing anything else?
The reason for this is that I use some learners as part of a GraphLearner and so their parameters names changed and I would like to have some code that adds the learner$id in front the parameters to use later for tuning (rather than rewriting them from scratch with the new names)
I think I have a partial solution here. It is only partial, because it does not support the transformation.
Where it works:
library(paradox)
my_ps = paradox::ps(
minsplit = p_int(1, 64),
cp = p_dbl(1e-04, 1)
)
my_ps$set_id = "john"
my_psc = ParamSetCollection$new(list(my_ps))
print(my_psc)
#> <ParamSetCollection>
#> id class lower upper nlevels default value
#> 1: john.minsplit ParamInt 1e+00 64 64 <NoDefault[3]>
#> 2: john.cp ParamDbl 1e-04 1 Inf <NoDefault[3]>
Created on 2022-12-07 by the reprex package (v2.0.1)
Where it does not:
library(paradox)
my_ps = paradox::ps(
minsplit = p_int(1, 64, logscale = TRUE),
cp = p_dbl(1e-04, 1)
)
my_ps$set_id = "john"
my_psc = ParamSetCollection$new(list(my_ps))
#> Error in .__ParamSetCollection__initialize(self = self, private = private, : Building a collection out sets, where a ParamSet has a trafo is currently unsupported!
Created on 2022-12-07 by the reprex package (v2.0.1)
The underlying problem is that we did not solve the problem of how to reconcile the parameter transformations of individual ParamSets and a possible parameter transformation of the ParamSetCollection
I fear that there is currently no neat solution for your problem.
Sorry I can not comment yet, this is not exactly the solution you are looking for but I hope this will fix the problem you are having.
You can set the param_space in the learner, before putting it in the graph, i.e. sticking with your search space. After you create the GraphLearner regularly it will have the desired search space.
A concrete example:
library(mlr3verse)
learner = lrn("regr.rpart", cp = to_tune(0.1, 0.2))
glrn = as_learner(po("pca") %>>% po("learner", learner))
at = auto_tuner(
"random_search",
glrn,
rsmp("holdout"),
term_evals = 10
)
task = tsk("mtcars")
at$train(task)

question for dask output when using dask.array.map_overlap

I would like to use dask.array.map_overlap to deal with the scipy interpolation function. However, I keep meeting errors that I cannot understand and hoping someone can answer this to me.
Here is the error message I have received if I want to run .compute().
ValueError: could not broadcast input array from shape (1070,0) into shape (1045,0)
To resolve the issue, I started to use .to_delayed() to check each partition outputs, and this is what I found.
Following is my python code.
Step 1. Load netCDF file through Xarray, and then output to dask.array with chunk size (400,400)
df = xr.open_dataset('./Brazil Sentinal2 Tile/' + data_file +'.nc')
lon, lat = df['lon'].data, df['lat'].data
slon = da.from_array(df['lon'], chunks=(400,400))
slat = da.from_array(df['lat'], chunks=(400,400))
data = da.from_array(df.isel(band=0).__xarray_dataarray_variable__.data, chunks=(400,400))
Step 2. declare a function for da.map_overlap use
def sumsum2(lon,lat,data, hex_res=10):
hex_col = 'hex' + str(hex_res)
lon_max, lon_min = lon.max(), lon.min()
lat_max, lat_min = lat.max(), lat.min()
b = box(lon_min, lat_min, lon_max, lat_max, ccw=True)
b = transform(lambda x, y: (y, x), b)
b = mapping(b)
target_df = pd.DataFrame(h3.polyfill( b, hex_res), columns=[hex_col])
target_df['lat'] = target_df[hex_col].apply(lambda x: h3.h3_to_geo(x)[0])
target_df['lon'] = target_df[hex_col].apply(lambda x: h3.h3_to_geo(x)[1])
tlon, tlat = target_df[['lon','lat']].values.T
abc = lNDI(points=(lon.ravel(), lat.ravel()),
values= data.ravel())(tlon,tlat)
target_df['out'] = abc
print(np.stack([tlon, tlat, abc],axis=1).shape)
return np.stack([tlon, tlat, abc],axis=1)
Step 3. Apply the da.map_overlap
b = da.map_overlap(sumsum2, slon[:1200,:1200], slat[:1200,:1200], data[:1200,:1200], depth=10, trim=True, boundary=None, align_arrays=False, dtype='float64',
)
Step 4. Using to_delayed() to test output shape
print(b.to_delayed().flatten()[0].compute().shape, )
print(b.to_delayed().flatten()[1].compute().shape)
(1065, 3)
(1045, 0)
(1090, 3)
(1070, 0)
which is saying that the output from da.map_overlap is only outputting 1-D dimension ( which is (1045,0) and (1070,0) ), while in the da.map_overlap, the output I am preparing is 2-D dimension ( which is (1065,3) and (1090,3) ).
In addition, if I turn off the trim argument, which is
c = da.map_overlap(sumsum2,
slon[:1200,:1200],
slat[:1200,:1200],
data[:1200,:1200],
depth=10,
trim=False,
boundary=None,
align_arrays=False,
dtype='float64',
)
print(c.to_delayed().flatten()[0].compute().shape, )
print(c.to_delayed().flatten()[1].compute().shape)
The output becomes
(1065, 3)
(1065, 3)
(1090, 3)
(1090, 3)
This is saying that when trim=True, I cut out everything?
because...
#-- print out the values
b.to_delayed().flatten()[0].compute()[:10,:]
(1065, 3)
array([], shape=(1045, 0), dtype=float64)
while...
#-- print out the values
c.to_delayed().flatten()[0].compute()[:10,:]
array([[ -47.83683837, -18.98359832, 1395.01848583],
[ -47.8482856 , -18.99038681, 2663.68391094],
[ -47.82800624, -18.99207069, 1465.56517187],
[ -47.81897323, -18.97919009, 2769.91556363],
[ -47.82066663, -19.00712956, 1607.85927095],
[ -47.82696896, -18.97167714, 2110.7516765 ],
[ -47.81562653, -18.98302933, 2662.72112163],
[ -47.82176881, -18.98594465, 2201.83205114],
[ -47.84567 , -18.97512514, 1283.20631652],
[ -47.84343568, -18.97270783, 1282.92117225]])
Any thoughts for this?
Thank You.
I guess I got the answer. Please let me if I am wrong.
I am not allowing to use trim=True is because I change the shape of output array (after surfing the internet, I notice that the shape of output array should be the same with the shape of input array). Since I change the shape, the dask has no idea how to deal with it so it returns the empty array to me (weird).
Instead of using trim=False, since I didn't ask cutting-out the buffer zone, it is now okay to output the return values. (although I still don't know why the dask cannot concat the chunked array, but believe is also related to shape)
The solution is using delayed function on da.concatenate, which is
delayed(da.concatenate)([e.to_delayed().flatten()[idx] for idx in range(len(e.to_delayed().flatten()))])
In this case, we are not relying on the concat function in map_overlap but use our own concat to combine the outputs we want.

no method matching logpdf when sampling from uniform distribution

I am trying to use reinforcement learning in julia to teach a car that is constantly being accelerated backwards (but with a positive initial velocity) to apply brakes so that it gets as close to a target distance as possible before moving backwards.
To do this, I am making use of POMDPs.jl and crux.jl which has many solvers (I'm using DQN). I will list what I believe to be the relevant parts of the script first, and then more of it towards the end.
To define the MDP, I set the initial position, velocity, and force from the brakes as a uniform distribution over some values.
#with_kw struct SliderMDP <: MDP{Array{Float32}, Array{Float32}}
x0 = Distributions.Uniform(0., 80.)# Distribution to sample initial position
v0 = Distributions.Uniform(0., 25.) # Distribution to sample initial velocity
d0 = Distributions.Uniform(0., 2.) # Distribution to sample brake force
...
end
My state holds the values of (position, velocity, brake force), and the initial state is given as:
function POMDPs.initialstate(mdp::SliderMDP)
ImplicitDistribution((rng) -> Float32.([rand(rng, mdp.x0), rand(rng, mdp.v0), rand(rng, mdp.d0)]))
end
Then, I set up my DQN solver using crux.jl and called a function to solve for the policy
solver_dqn = DQN(π=Q_network(), S=s, N=30000)
policy_dqn = solve(solver_dqn, mdp)
calling solve() gives me the error MethodError: no method matching logpdf(::Distributions.Categorical{Float64, Vector{Float64}}, ::Nothing). I am quite sure that this comes from the initial state sampling, but I am not sure why or how to fix it. I have only been learning RL from various books and online lectures for a very short time, so any help regarding the error or my the model I set up (or anything else I'm oblivious to) would be appreciated.
More comprehensive code:
Packages:
using POMDPs
using POMDPModelTools
using POMDPPolicies
using POMDPSimulators
using Parameters
using Random
using Crux
using Flux
using Distributions
Rest of it:
#with_kw struct SliderMDP <: MDP{Array{Float32}, Array{Float32}}
x0 = Distributions.Uniform(0., 80.)# Distribution to sample initial position
v0 = Distributions.Uniform(0., 25.) # Distribution to sample initial velocity
d0 = Distributions.Uniform(0., 2.) # Distribution to sample brake force
m::Float64 = 1.
tension::Float64 = 3.
dmax::Float64 = 2.
target::Float64 = 80.
dt::Float64 = .05
γ::Float32 = 1.
actions::Vector{Float64} = [-.1, 0., .1]
end
function POMDPs.gen(env::SliderMDP, s, a, rng::AbstractRNG = Random.GLOBAL_RNG)
x, ẋ, d = s
if x >= env.target
a = .1
end
if d+a >= env.dmax || d+a <= 0
a = 0.
end
force = (d + env.tension) * -1
ẍ = force/env.m
# Simulation
x_ = x + env.dt * ẋ
ẋ_ = ẋ + env.dt * ẍ
d_ = d + a
sp = vcat(x_, ẋ_, d_)
reward = abs(env.target - x) * -1
return (sp=sp, r=reward)
end
function POMDPs.initialstate(mdp::SliderMDP)
ImplicitDistribution((rng) -> Float32.([rand(rng, mdp.x0), rand(rng, mdp.v0), rand(rng, mdp.d0)]))
end
POMDPs.isterminal(mdp::SliderMDP, s) = s[2] <= 0
POMDPs.discount(mdp::SliderMDP) = mdp.γ
mdp = SliderMDP();
s = state_space(mdp); # Using Crux.jl
function Q_network()
layer1 = Dense(3, 64, relu)
layer2 = Dense(64, 64, relu)
layer3 = Dense(64, length(3))
return DiscreteNetwork(Chain(layer1, layer2, layer3), [-.1, 0, .1])
end
solver_dqn = DQN(π=Q_network(), S=s, N=30000) # Using Crux.jl
policy_dqn = solve(solver_dqn, mdp) # Error comes here
Stacktrace:
policy_dqn
MethodError: no method matching logpdf(::Distributions.Categorical{Float64, Vector{Float64}}, ::Nothing)
Closest candidates are:
logpdf(::Distributions.DiscreteNonParametric, !Matched::Real) at C:\Users\name\.julia\packages\Distributions\Xrm9e\src\univariate\discrete\discretenonparametric.jl:106
logpdf(::Distributions.UnivariateDistribution{S} where S<:Distributions.ValueSupport, !Matched::AbstractArray) at deprecated.jl:70
logpdf(!Matched::POMDPPolicies.PlaybackPolicy, ::Any) at C:\Users\name\.julia\packages\POMDPPolicies\wMOK3\src\playback.jl:34
...
logpdf(::Crux.ObjectCategorical, ::Float32)#utils.jl:16
logpdf(::Crux.DistributionPolicy, ::Vector{Float64}, ::Float32)#policies.jl:305
var"#exploration#133"(::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, ::typeof(Crux.exploration), ::Crux.DistributionPolicy, ::Vector{Float64})#policies.jl:302
exploration#policies.jl:297[inlined]
action(::Crux.DistributionPolicy, ::Vector{Float64})#policies.jl:294
var"#exploration#136"(::Crux.DiscreteNetwork, ::Int64, ::typeof(Crux.exploration), ::Crux.MixedPolicy, ::Vector{Float64})#policies.jl:326
var"#step!#173"(::Bool, ::Int64, ::typeof(Crux.step!), ::Dict{Symbol, Array}, ::Int64, ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace})#sampler.jl:55
var"#steps!#174"(::Int64, ::Bool, ::Int64, ::Bool, ::Bool, ::Bool, ::typeof(Crux.steps!), ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace})#sampler.jl:108
var"#fillto!#177"(::Int64, ::Bool, ::typeof(Crux.fillto!), ::Crux.ExperienceBuffer{Array}, ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace}, ::Int64)#sampler.jl:156
solve(::Crux.OffPolicySolver, ::Main.workspace#2.SliderMDP)#off_policy.jl:86
top-level scope#Local: 1[inlined]
Short answer:
Change your output vector to Float32 i.e. Float32[-.1, 0, .1].
Long answer:
Crux creates a Distribution over your network's output values, and at some point (policies.jl:298) samples a random value from it. It then converts this value to a Float32. Later (utils.jl:15) it does a findfirst to find the index of this value in the original output array (stored as objs within the distribution), but because the original array is still Float64, this fails and returns a nothing. Hence the error.
I believe this (converting the sampled value but not the objs array and/or not using approximate equality check i.e. findfirst(isapprox(x), d.objs)) to be a bug in the package, and would encourage you to raise this as an issue on Github.

Keras: connecting two layers from different models to create new model

What I am trying to do:
I want to connect any Layers from different models to create a new keras model.
What I found so far:
https://github.com/keras-team/keras/issues/4205: using the Model's call class to change the input of another model. My problems with this approach:
Can only change the input of the Model, no other layers. So if I want to cut off some layers at the beginning of the encoder, that is not possible
Not a fan of the nested array structure when getting the config file. Would prefer to have a 1D-array
When using model.summary() or plot_model(), the encoder only shows as "Model". If anything I would say both models should be wrapped. So the config should show [model_base, model_encoder] and not [base_input, base_conv2D, ..., encoder_model]
To be fair, with this approach: https://github.com/keras-team/keras/issues/3021, the point above is actually possible, but again, it is very inflexible. As soon as I want to cut off some layers at the top or bottom of the base or encoder network, this approach fails
https://github.com/keras-team/keras/issues/3465: Adding new layers to a base model by using any output of the base model. Problems here:
While it is possible to use any layer from the base model, which means I can cut off layers from the base model, I can not load the encoder as a keras model. The top models always must be created new.
What I have tried:
My approach to connecting any layers from different models:
Clear inbound nodes of input layer
use the call() method of the output layer with the tensor of the output layer
Clean up the outbound nodes of the output tensor by switching out the new created tensor with the previous output tensor
I was really optimistic at first, as the summary() and the plot_model() got me exactly what I wanted, thus the Node graph should be fine, right? But I ran into errors when training. While the approach in the "What I found so far" section trained fine, I ran into an error with my approach. This is the error message:
File "C:\Anaconda\envs\dlpipe\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 508, in apply_op
(input_name, err))
ValueError: Tried to convert 'x' to a tensor and failed. Error: None values not supported.
Might be an important info, that I am using Tensorflow as backend. I was able to trace back the root of this error. It seems like there is an error when the gradients are calculated. Usually, there is a gradient calculation for each node, but all the nodes of the base network have "None" when using my approach. So basically in keras/optimizers.py, get_updates() when the gradients are calculated (grad = self.get_gradients(loss, params)).
Here is the code (without the training), with all three approaches implemented:
def create_base():
in_layer = Input(shape=(32, 32, 3), name="base_input")
x = Conv2D(32, (3, 3), padding='same', activation="relu", name="base_conv2d_1")(in_layer)
x = Conv2D(32, (3, 3), padding='same', activation="relu", name="base_conv2d_2")(x)
x = MaxPooling2D(pool_size=(2, 2), name="base_maxpooling_2d_1")(x)
x = Dropout(0.25, name="base_dropout")(x)
x = Conv2D(64, (3, 3), padding='same', activation="relu", name="base_conv2d_3")(x)
x = Conv2D(64, (3, 3), padding='same', activation="relu", name="base_conv2d_4")(x)
x = MaxPooling2D(pool_size=(2, 2), name="base_maxpooling2d_2")(x)
x = Dropout(0.25, name="base_dropout_2")(x)
return Model(inputs=in_layer, outputs=x, name="base_model")
def create_encoder():
in_layer = Input(shape=(8, 8, 64))
x = Flatten(name="encoder_flatten")(in_layer)
x = Dense(512, activation="relu", name="encoder_dense_1")(x)
x = Dropout(0.5, name="encoder_dropout_2")(x)
x = Dense(10, activation="softmax", name="encoder_dense_2")(x)
return Model(inputs=in_layer, outputs=x, name="encoder_model")
def extend_base(input_model):
x = Flatten(name="custom_flatten")(input_model.output)
x = Dense(512, activation="relu", name="custom_dense_1")(x)
x = Dropout(0.5, name="custom_dropout_2")(x)
x = Dense(10, activation="softmax", name="custom_dense_2")(x)
return Model(inputs=input_model.input, outputs=x, name="custom_edit")
def connect_layers(from_tensor, to_layer, clear_inbound_nodes=True):
try:
tmp_output = to_layer.output
except AttributeError:
raise ValueError("Connecting to shared layers is not supported!")
if clear_inbound_nodes:
to_layer.inbound_nodes = []
else:
tensor_list = to_layer.inbound_nodes[0].input_tensors
tensor_list.append(from_tensor)
from_tensor = tensor_list
to_layer.inbound_nodes = []
new_output = to_layer(from_tensor)
for out_node in to_layer.outbound_nodes:
for i, in_tensor in enumerate(out_node.input_tensors):
if in_tensor == tmp_output:
out_node.input_tensors[i] = new_output
if __name__ == "__main__":
base = create_base()
encoder = create_encoder()
#new_model_1 = Model(inputs=base.input, outputs=encoder(base.output))
#plot_model(new_model_1, to_file="plots/new_model_1.png")
new_model_2 = extend_base(base)
plot_model(new_model_2, to_file="plots/new_model_2.png")
print(new_model_2.summary())
base_layer = base.get_layer("base_dropout_2")
top_layer = encoder.get_layer("encoder_flatten")
connect_layers(base_layer.output, top_layer)
new_model_3 = Model(inputs=base.input, outputs=encoder.output)
plot_model(new_model_3, to_file="plots/new_model_3.png")
print(new_model_3.summary())
I know this is a lot of text and a lot of code. But I feel like it is needed to explain the issue here.
EDIT: I just tried thenao and I think the error gives away more information:
theano.gradient.DisconnectedInputError:
Backtrace when that variable is created:
It seems like every layer from the encoder model has some connection with the encoder input layer via TensorVariables.
So this is what I ended up with for the connect_layer() function:
def connect_layers(from_tensor, to_layer, old_tensor=None):
# if there is any shared layer after the to_layer, it is not supported
try:
tmp_output = to_layer.output
except AttributeError:
raise ValueError("Connecting to shared layers is not supported!")
# check if to_layer has multiple input_tensors, and therefore some sort of merge layer
if len(to_layer.inbound_nodes[0].input_tensors) > 1:
tensor_list = to_layer.inbound_nodes[0].input_tensors
found_tensor = False
for i, tensor in enumerate(tensor_list):
# exchange the old tensor with the new created tensor
if tensor == old_tensor:
tensor_list[i] = from_tensor
found_tensor = True
break
if not found_tensor:
tensor_list.append(from_tensor)
from_tensor = tensor_list
to_layer.inbound_nodes = []
else:
to_layer.inbound_nodes = []
new_output = to_layer(from_tensor)
tmp_out_nodes = to_layer.outbound_nodes[:]
to_layer.outbound_nodes = []
# recursively connect all layers after the current to_layer
for out_node in tmp_out_nodes:
l = out_node.outbound_layer
print("Connecting: " + str(to_layer) + " ----> " + str(l))
connect_layers(new_output, l, tmp_output)
As each Tensor has all the information about it's root tensor via -> owner.inputs -> owner.inputs -> ..., all tensor following the new_output tensor must be updated.
It was a lot easier to debug that with theano then with tensorflow backend.
I still need to figure out how to deal with shared layers. With the current implementation it is not possible to connect other models that contain a shared layer after the first to_layer.

Tensorflow: Variable rnn/basic_lstm_cell/kernel already exists, disallowed

Here is a part of my tensorflow RNN network code written in jupyter. The whole code runs perfect for the first time, however, running it furthermore produces an error. The code:
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
def recurrent_nn_model(x):
x = tf.transpose(x, [1,0,2])
x = tf.reshape(x, [-1, chunk_size])
x = tf.split(x, n_chunks, 0)
lstm_layer = {'hidden_state': tf.zeros([n_batches, lstm_size]),
'current_state': tf.zeros([n_batches, lstm_size])}
layer = {'weights': tf.Variable(tf.random_normal([lstm_size, n_classes])),
'biases': tf.Variable(tf.random_normal([n_classes]))}
lstm = rnn_cell.BasicLSTMCell(lstm_size)
rnn_outputs, rnn_states = rnn.static_rnn(lstm, x, dtype=tf.float32)
output = tf.add(tf.matmul(rnn_outputs[-1], layer['weights']),
layer['biases'])
return output
The error is:
Variable rnn/basic_lstm_cell/kernel already exists, disallowed. Did
you mean to set reuse=True in VarScope? Originally defined at:
If recurrent_nn_model is the whole network, just add this line to reset previously defined graph:
tf.reset_default_graph()
If you're intentionally calling recurrent_nn_model several times and combine these RNNs into one graph, you should use different variable scopes for each one:
with tf.variable_scope('lstm1'):
recurrent_nn_model(x1)
with tf.variable_scope('lstm2'):
recurrent_nn_model(x2)

Resources