How can I synchronize two Deep Reinforcement Learning agents?

How can I synchronize two Deep Reinforcement Learning agents? - multi-agent-reinforcement-learning

I am doing a project in which I simulate a computer network. Each node of the network is a Deep Reinforcement Learning agent and its states will depend on a global matrix from which they have to take data and then modify data. And that I would like to know when would be the most appropriate time to update the state of these agents and what would be the most correct option.
The state has one row more than the matrix containing the MLU of the links. This row will store the packet to be worked on.
#Creamos una matriz que almacenara la mlu en cada momento, inicializada a 0 en los nodos #conectados
matrizMLU = np.full((nodos_red, nodos_red), -1, int)
for i in range(nodos_red):
for j in range(i+1, nodos_red):
if j in puertos[i]:
matrizMLU[i][j] = 0
matrizMLU[j][i] = 0
class nodoEnv(Env):
def __init__(self, idNodo): #Inicializacion del entorno
self.id = idNodo
self.action_space = Discrete(5) #Acciones
self.observation_space = Box(low=0, high=100, shape = (len(matrizMLU)+1, len(matrizMLU)))
self.estado = np.array(np.zeros((len(matrizMLU)+1, len(matrizMLU)), dtype = int))
self.camino = calcularCaminos(idNodo)

Related

LassoCV getting axis -1 is out of bounds for array of dimension 0 and other questions

Good evening to all,
I am trying to implement for the first time LassoCV with sklearn.
My code is as follows:
numeric_features = ['AGE_2019', 'Inhabitants'] categorical_features = ['familty_type','studying','Job_42','sex','DEGREE', 'Activity_type', 'Nom de la commune', 'city_type', 'DEP', 'INSEE', 'Nom du département', 'reg', 'Nom de la région']
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median'))
,('scaler', MinMaxScaler()) # Centrage des données ])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant',fill_value='missing'))
,('encoder', OneHotEncoder(handle_unknown='ignore')) # Création de variables binaires pour les variables catégoriques ])
preprocessor = ColumnTransformer( transformers=[
('numeric', numeric_transformer, numeric_features) ,('categorical', categorical_transformer, categorical_features) ])
# Creation of the pipeline
lassocv_piped = Pipeline([
('preprocessor', preprocessor),
('model', LassoCV())
])
# Creation of the grid of parameters
dt_params = {'model__alphas': np.array([0.5])
}
cv_folds = KFold(n_splits=5, shuffle=True, random_state=0)
lassocv_grid_piped = GridSearchCV(lassocv_piped, dt_params, cv=cv_folds, n_jobs=-1, scoring=['neg_mean_squared_error', 'r2'], refit='r2')
# Fitting our model
lassocv_grid_piped.fit(df_X_train,df_Y_train.values.ravel())
# Getting our metrics and predictions
Y_pred_lassocv = lassocv_grid_piped.predict(df_X_test)
metrics_lassocv = lassocv_grid_piped.cv_results_ best_lassocv_parameters = lassocv_grid_piped.best_params_
print('Best test negatif MSE of the base model : ', max(metrics_lassocv['mean_test_neg_mean_squared_error'])) print('Best test R^2 of the base model : ', max(metrics_lassocv['mean_test_r2'])) print('Best parameters of the base model : ', best_lassocv_parameters)
# Graphique representation
results = pd.DataFrame(dt_params) for k in range(5):
results = pd.concat([results,
pd.DataFrame(lassocv_grid_piped.cv_results_['split'+str(k)+'_test_neg_mean_squared_error'])],axis=1)
sns.relplot(data=results.melt('model__alphas',value_name='neg_mean_squared_error'),x='model__alphas',y='neg_mean_squared_error',kind='line')
I am still a novice when it comes to using this model. So, I have some questions about the use of this estimator:
Is it useful to use a cv_fold outside the estimator, as I do?
Is it useful to set up a GridSearchCV to test the different alpha values?
How is it possible to extract the R^2 from our model?
Also, I encounter this error:
AxisError: axis -1 is out of bounds for array of dimension 0
Would you have an idea to solve it?
I wish you a good evening!

After a good night's sleep, I was able to overcome some of my problems.
Is it useful to use a cv_fold outside the estimator, as I do ?
After studying the documentation of LassoCV a bit, it seems not. So I could remove cv_fold from my code. Instead, I could use the cv argument of LassoCV.
Is it useful to set up a GridSearchCV to test the different alpha values?
I haven't really been able to answer that question yet. It seems that LassoCV does it by itself.
How is it possible to extract the R^2 from our model ?
This can be done simply with the function: .score(X,y).
As for my error message. I was able to get rid of it once I deleted GridSearchCV.
Here's my final code :
numeric_features = ['AGE_2019', 'Inhabitants']
categorical_features = ['familty_type','studying','Job_42','sex','DEGREE', 'Activity_type', 'Nom de la commune', 'city_type', 'DEP', 'INSEE', 'Nom du département', 'reg', 'Nom de la région']
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median'))
,('scaler', MinMaxScaler()) # Centrage des données
])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant',fill_value='missing'))
,('encoder', OneHotEncoder(handle_unknown='ignore')) # Création de variables binaires pour les variables catégoriques
])
preprocessor = ColumnTransformer(
transformers=[
('numeric', numeric_transformer, numeric_features)
,('categorical', categorical_transformer, categorical_features)
])
# Creation of the pipeline
list_metrics_lassocv = []
list_best_lassocv_parameters = []
for i in range (1,12) :
lassocv_piped = Pipeline([
('preprocessor', preprocessor),
('model', LassoCV(cv=5, n_alphas=i, random_state=0))
])
# Fitting our model
lassocv_piped.fit(df_X_train,df_Y_train.values.ravel())
# Getting our metrics and predictions
Y_pred_lassocv = lassocv_piped.predict(df_X_test)
metrics_lassocv = lassocv_piped.score(df_X_train,df_Y_train.values.ravel())
best_lassocv_parameters = lassocv_piped['model'].alpha_
list_metrics_lassocv.append(metrics_lassocv)
list_best_lassocv_parameters.append(best_lassocv_parameters)
Do not hesitate to correct me if you see an impression or an error.

Handling error with regressions inside a parallel foreach loop

Hi I am having issues regarding a foreach loop where in every iteration I estimate a regression on a subset of the data with a different list of controls on several outcomes. The problem is that for some outcomes in some countries I only have missing values and therefore the regression function returns an error message. I would like to be able to run the loop, get the output with NAs or a string saying "Error" for example instead of the coefficient table. I tried several things but they don't quite work with the .combine = rbind option and if I use .combine = c I get a very messy output. Thanks in advance for any help.
reg <- function(y, d, c){
if (missing(c))
feols(as.formula(paste0(y, "~ 0 + treatment")), data = d)
else {
feols(as.formula(paste0(y, "~ 0 + treatment + ", c)), data = d)
}
}
# Here we set up the parallelization to run the code on the server
n.cores <- 9 #parallel::detectCores() - 1
#create the cluster
my.cluster <- parallel::makeCluster(
n.cores,
type = "PSOCK"
)
# print(my.cluster)
#register it to be used by %dopar%
doParallel::registerDoParallel(cl = my.cluster)
# #check if it is registered (optional)
# foreach::getDoParRegistered()
# #how many workers are available? (optional)
# foreach::getDoParWorkers()
# Here is the cycle to parallel regress each outcome on the global treatment
# variable for each RCT with strata control
tables <- foreach(
n = 1:9, .combine = rbind, .packages = c('data.table', 'fixest'),
.errorhandling = "pass"
) %dopar% {
dt_target <- dt[country == n]
c <- controls[n]
est <- lapply(outcomes, function(x) reg(y = x, d = dt_target, c))
table <- etable(est, drop = "!treatment", cluster = "uid", fitstat = "n")
table
}

no method matching logpdf when sampling from uniform distribution

I am trying to use reinforcement learning in julia to teach a car that is constantly being accelerated backwards (but with a positive initial velocity) to apply brakes so that it gets as close to a target distance as possible before moving backwards.
To do this, I am making use of POMDPs.jl and crux.jl which has many solvers (I'm using DQN). I will list what I believe to be the relevant parts of the script first, and then more of it towards the end.
To define the MDP, I set the initial position, velocity, and force from the brakes as a uniform distribution over some values.
#with_kw struct SliderMDP <: MDP{Array{Float32}, Array{Float32}}
x0 = Distributions.Uniform(0., 80.)# Distribution to sample initial position
v0 = Distributions.Uniform(0., 25.) # Distribution to sample initial velocity
d0 = Distributions.Uniform(0., 2.) # Distribution to sample brake force
...
end
My state holds the values of (position, velocity, brake force), and the initial state is given as:
function POMDPs.initialstate(mdp::SliderMDP)
ImplicitDistribution((rng) -> Float32.([rand(rng, mdp.x0), rand(rng, mdp.v0), rand(rng, mdp.d0)]))
end
Then, I set up my DQN solver using crux.jl and called a function to solve for the policy
solver_dqn = DQN(π=Q_network(), S=s, N=30000)
policy_dqn = solve(solver_dqn, mdp)
calling solve() gives me the error MethodError: no method matching logpdf(::Distributions.Categorical{Float64, Vector{Float64}}, ::Nothing). I am quite sure that this comes from the initial state sampling, but I am not sure why or how to fix it. I have only been learning RL from various books and online lectures for a very short time, so any help regarding the error or my the model I set up (or anything else I'm oblivious to) would be appreciated.
More comprehensive code:
Packages:
using POMDPs
using POMDPModelTools
using POMDPPolicies
using POMDPSimulators
using Parameters
using Random
using Crux
using Flux
using Distributions
Rest of it:
#with_kw struct SliderMDP <: MDP{Array{Float32}, Array{Float32}}
x0 = Distributions.Uniform(0., 80.)# Distribution to sample initial position
v0 = Distributions.Uniform(0., 25.) # Distribution to sample initial velocity
d0 = Distributions.Uniform(0., 2.) # Distribution to sample brake force
m::Float64 = 1.
tension::Float64 = 3.
dmax::Float64 = 2.
target::Float64 = 80.
dt::Float64 = .05
γ::Float32 = 1.
actions::Vector{Float64} = [-.1, 0., .1]
end
function POMDPs.gen(env::SliderMDP, s, a, rng::AbstractRNG = Random.GLOBAL_RNG)
x, ẋ, d = s
if x >= env.target
a = .1
end
if d+a >= env.dmax || d+a <= 0
a = 0.
end
force = (d + env.tension) * -1
ẍ = force/env.m
# Simulation
x_ = x + env.dt * ẋ
ẋ_ = ẋ + env.dt * ẍ
d_ = d + a
sp = vcat(x_, ẋ_, d_)
reward = abs(env.target - x) * -1
return (sp=sp, r=reward)
end
function POMDPs.initialstate(mdp::SliderMDP)
ImplicitDistribution((rng) -> Float32.([rand(rng, mdp.x0), rand(rng, mdp.v0), rand(rng, mdp.d0)]))
end
POMDPs.isterminal(mdp::SliderMDP, s) = s[2] <= 0
POMDPs.discount(mdp::SliderMDP) = mdp.γ
mdp = SliderMDP();
s = state_space(mdp); # Using Crux.jl
function Q_network()
layer1 = Dense(3, 64, relu)
layer2 = Dense(64, 64, relu)
layer3 = Dense(64, length(3))
return DiscreteNetwork(Chain(layer1, layer2, layer3), [-.1, 0, .1])
end
solver_dqn = DQN(π=Q_network(), S=s, N=30000) # Using Crux.jl
policy_dqn = solve(solver_dqn, mdp) # Error comes here
Stacktrace:
policy_dqn
MethodError: no method matching logpdf(::Distributions.Categorical{Float64, Vector{Float64}}, ::Nothing)
Closest candidates are:
logpdf(::Distributions.DiscreteNonParametric, !Matched::Real) at C:\Users\name\.julia\packages\Distributions\Xrm9e\src\univariate\discrete\discretenonparametric.jl:106
logpdf(::Distributions.UnivariateDistribution{S} where S<:Distributions.ValueSupport, !Matched::AbstractArray) at deprecated.jl:70
logpdf(!Matched::POMDPPolicies.PlaybackPolicy, ::Any) at C:\Users\name\.julia\packages\POMDPPolicies\wMOK3\src\playback.jl:34
...
logpdf(::Crux.ObjectCategorical, ::Float32)#utils.jl:16
logpdf(::Crux.DistributionPolicy, ::Vector{Float64}, ::Float32)#policies.jl:305
var"#exploration#133"(::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, ::typeof(Crux.exploration), ::Crux.DistributionPolicy, ::Vector{Float64})#policies.jl:302
exploration#policies.jl:297[inlined]
action(::Crux.DistributionPolicy, ::Vector{Float64})#policies.jl:294
var"#exploration#136"(::Crux.DiscreteNetwork, ::Int64, ::typeof(Crux.exploration), ::Crux.MixedPolicy, ::Vector{Float64})#policies.jl:326
var"#step!#173"(::Bool, ::Int64, ::typeof(Crux.step!), ::Dict{Symbol, Array}, ::Int64, ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace})#sampler.jl:55
var"#steps!#174"(::Int64, ::Bool, ::Int64, ::Bool, ::Bool, ::Bool, ::typeof(Crux.steps!), ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace})#sampler.jl:108
var"#fillto!#177"(::Int64, ::Bool, ::typeof(Crux.fillto!), ::Crux.ExperienceBuffer{Array}, ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace}, ::Int64)#sampler.jl:156
solve(::Crux.OffPolicySolver, ::Main.workspace#2.SliderMDP)#off_policy.jl:86
top-level scope#Local: 1[inlined]

Short answer:
Change your output vector to Float32 i.e. Float32[-.1, 0, .1].
Long answer:
Crux creates a Distribution over your network's output values, and at some point (policies.jl:298) samples a random value from it. It then converts this value to a Float32. Later (utils.jl:15) it does a findfirst to find the index of this value in the original output array (stored as objs within the distribution), but because the original array is still Float64, this fails and returns a nothing. Hence the error.
I believe this (converting the sampled value but not the objs array and/or not using approximate equality check i.e. findfirst(isapprox(x), d.objs)) to be a bug in the package, and would encourage you to raise this as an issue on Github.

Tensorflow LSTM PTB Example - Understanding forward and backward pass

Right now I am going through the tensorflow example on LSTMs where they use the PTB dataset to create an LSTM network capable of predicting the next word. I've spent a lot of time trying to understand the code, and have a good understanding for most of it however there is one function which I don't fully grasp:
def run_epoch(session, model, eval_op=None, verbose=False):
"""Runs the model on the given data."""
costs = 0.0
iters = 0
state = session.run(model.initial_state)
fetches = {
"cost": model.cost,
"final_state": model.final_state,
}
if eval_op is not None:
fetches["eval_op"] = eval_op
for step in range(model.input.epoch_size):
feed_dict = {}
for i, (c, h) in enumerate(model.initial_state):
feed_dict[c] = state[i].c
feed_dict[h] = state[i].h
vals = session.run(fetches, feed_dict)
cost = vals["cost"]
state = vals["final_state"]
costs += cost
iters += model.input.num_steps
return np.exp(costs / iters)
My confusion is this: each time through the outerloop I believe we have processed batch_size * num_steps numbers of words, done the forward propagation and done the backward propagation. But, how in the next iteration, for example, do we know to start with the 36th word of each batch if num_steps = 35? I suspect it is some change in an attribute of the class model on each iteration but I cannot figure that out. Thanks for your help.

Stata: multiplying each variable of a set of time-series variables with the corresponding variable of another set

Being fairly new to Stata, I'm having a difficulty figuring out how to do the following:
I have time-series data on selling price (p) and quantity sold (q) for 10 products in a single datafile (i,e., 20 variables, p01-p10 and q01-q10). I am strugling with appropriate stata command that computes sales revenue (pq) time-series for each of these 10 products (i.e., pq01-pq10).
Many thanks for your help.

forval i = 1/10 {
local j : display %02.0f `i'
gen pq`j' = p`j' * q`j'
}
A standard loop over 1/10 won't get you the leading zero in 01/09. For that we need to use an appropriate format. See also
#article {pr0051,
author = "Cox, N. J.",
title = "Stata tip 85: Looping over nonintegers",
journal = "Stata Journal",
publisher = "Stata Press",
address = "College Station, TX",
volume = "10",
number = "1",
year = "2010",
pages = "160-163(4)",
url = "http://www.stata-journal.com/article.html?article=pr0051"
}
(added later) Another way to do it is
local j = string(`i', "%02.0f")
That makes it a bit more explicit that you are mapping from numbers 1,...,10 to strings "01",...,"10".

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How can I synchronize two Deep Reinforcement Learning agents? - multi-agent-reinforcement-learning

Related

LassoCV getting axis -1 is out of bounds for array of dimension 0 and other questions

Handling error with regressions inside a parallel foreach loop

no method matching logpdf when sampling from uniform distribution

Tensorflow LSTM PTB Example - Understanding forward and backward pass

Stata: multiplying each variable of a set of time-series variables with the corresponding variable of another set

Categories

Resources