How to concate imagefolder datasets on Pytorch? - machine-learning

I have a question regarding the concate function on Pytorch. I could not find any answers that could help me solve this. I have this code:
train_val_folder = root + "train_val_data/"
train_set1_original = datasets.ImageFolder(train_val_folder, transform=train_val_transform_croponly)
train_val_modified = datasets.ImageFolder(train_val_folder, transform=train_val_transform)
train_val_dataset= torch.utils.data.ConcatDataset([train_set1_original , train_val_modified ])
classes = train_val_dataset.classes
train_val_length = len(train_val_dataset)
print(train_val_length)
AttributeError: 'ConcatDataset' object has no attribute 'classes'
Is there an easier way to concate the original and transformed datasets while preserving .classes from ImageFolder?

Related

Rewriting ParamSet ids from mlr3::paradox()

Let's say I have the following ParamSet object:
my_ps = paradox::ps(
minsplit = p_int(1, 64, logscale = TRUE),
cp = p_dbl(1e-04, 1, logscale = TRUE))
Is it possible to rename minsplit to survTree.minsplit without changing anything else?
The reason for this is that I use some learners as part of a GraphLearner and so their parameters names changed and I would like to have some code that adds the learner$id in front the parameters to use later for tuning (rather than rewriting them from scratch with the new names)
I think I have a partial solution here. It is only partial, because it does not support the transformation.
Where it works:
library(paradox)
my_ps = paradox::ps(
minsplit = p_int(1, 64),
cp = p_dbl(1e-04, 1)
)
my_ps$set_id = "john"
my_psc = ParamSetCollection$new(list(my_ps))
print(my_psc)
#> <ParamSetCollection>
#> id class lower upper nlevels default value
#> 1: john.minsplit ParamInt 1e+00 64 64 <NoDefault[3]>
#> 2: john.cp ParamDbl 1e-04 1 Inf <NoDefault[3]>
Created on 2022-12-07 by the reprex package (v2.0.1)
Where it does not:
library(paradox)
my_ps = paradox::ps(
minsplit = p_int(1, 64, logscale = TRUE),
cp = p_dbl(1e-04, 1)
)
my_ps$set_id = "john"
my_psc = ParamSetCollection$new(list(my_ps))
#> Error in .__ParamSetCollection__initialize(self = self, private = private, : Building a collection out sets, where a ParamSet has a trafo is currently unsupported!
Created on 2022-12-07 by the reprex package (v2.0.1)
The underlying problem is that we did not solve the problem of how to reconcile the parameter transformations of individual ParamSets and a possible parameter transformation of the ParamSetCollection
I fear that there is currently no neat solution for your problem.
Sorry I can not comment yet, this is not exactly the solution you are looking for but I hope this will fix the problem you are having.
You can set the param_space in the learner, before putting it in the graph, i.e. sticking with your search space. After you create the GraphLearner regularly it will have the desired search space.
A concrete example:
library(mlr3verse)
learner = lrn("regr.rpart", cp = to_tune(0.1, 0.2))
glrn = as_learner(po("pca") %>>% po("learner", learner))
at = auto_tuner(
"random_search",
glrn,
rsmp("holdout"),
term_evals = 10
)
task = tsk("mtcars")
at$train(task)

How to get multiple answers from the context using BertForQuestionAnswering

How do I get multiple answers from the text using BertForQuestionAnswering, just like for the below question there are two possible answers
a nice puppet
a software engineer
Below is the code snippet for the same:
from transformers import BertTokenizer, BertForQuestionAnswering
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet.Jim Henson was a software engineer."
input_ids = tokenizer.encode(question, text)
token_type_ids = [0 if i <= input_ids.index(102) else 1 for i in range(len(input_ids))]
start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)
answer = ' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1])
print(answer)
'a software engineer' ```
Thanks in advance!!
You are simply outputting the most likely answer according to BERT. If you want multiple answers, you will need to actually select multiple answers. In this case that would be the first and second most likely answer. To do this, create a new answer variable and select the second most likely start and answer word from the start_scores and end_scores variable. I recommend you use at torch.topk()
answer1_start, answer2_start = torch.topk(start_scores)
answer2_end, answer2_end = torch.topk(end_scores)
answer1 = ' '.join(all_tokens[answer1_start : answer1_end + 1])
answer2 = ' '.join(all_tokens[answer2_start : answer2_end + 1])
print(answer1, answer2)
Let me know how this works.

What's the difference between optimizer.compute_gradient() and tf.gradients() in tensorflow?

The following code I've written, fails at self.optimizer.compute_gradients(self.output,all_variables)
import tensorflow as tf
import tensorlayer as tl
from tensorflow.python.framework import ops
import numpy as np
class Network1():
def __init__(self):
ops.reset_default_graph()
tl.layers.clear_layers_name()
self.sess = tf.Session()
self.optimizer = tf.train.AdamOptimizer(learning_rate=0.1)
self.input_x = tf.placeholder(tf.float32, shape=[None, 784],name="input")
input_layer = tl.layers.InputLayer(self.input_x)
relu1 = tl.layers.DenseLayer(input_layer, n_units=800, act = tf.nn.relu, name="relu1")
relu2 = tl.layers.DenseLayer(relu1, n_units=500, act = tf.nn.relu, name="relu2")
self.output = relu2.all_layers[-1]
all_variables = relu2.all_layers
self.gradient = self.optimizer.compute_gradients(self.output,all_variables)
init_op = tf.initialize_all_variables()
self.sess.run(init_op)
with warning,
TypeError: Argument is not a tf.Variable: Tensor("relu1/Relu:0",
shape=(?, 800), dtype=float32)
However when I change that line to tf.gradients(self.output,all_variables), the code works fine, at least no warning is reported. Where did I miss, since I think these two methods are actually executing the same thing, that is return a list of (gradient, variable) pairs.
optimizer.compute_gradients wraps tf.gradients(), as you can see here. It does additional asserts (which explains your error).
I would like to add to the above answer by referring to a simple point. optimizer.compute_gradients return a list of tuples as (grads, vars) pairs. Variables are always there, but the gradients might be None. That makes sense since computing the gradients of specific loss with respect to some of the variables in var_list can be None. It says there is no dependency.
On the other hand, tf.gradients only return the list of sum(dy/dx) for each variable. It MUST be accompanied by the variable list for applying the gradient update.
Henceforth, the following two approaches can be utilized interchangeably:
### Approach 1 ###
variable_list = desired_list_of_variables
gradients = optimizer.compute_gradients(loss,var_list=variable_list)
optimizer.apply_gradients(gradients)
# ### Approach 2 ###
variable_list = desired_list_of_variables
gradients = tf.gradients(loss, var_list=variable_list)
optimizer.apply_gradients(zip(gradients, variable_list))

Pc-Stable from pcalg

I am using the pc-stable from the package ‘pcalg’ version 2.0-10 to learn the structure . what I understand this algorithm does not effect the the order of the input data because it is order_independent. when I run it with different order ,I got different graph. can any one help me with this issue and this is my code.
library(pracma)
randindexMatriax <- matrix(0,10,ncol(TrainData))
numberUnique_val_col = vector()
pdf("Graph for Test PC Stable with random order.pdf")
par(mfrow=c(2,1))
for (i in 1:10)
{
randindex<-randperm(1:ncol(TrainData))
randindexMatriax[i,]<-randindex
TrainDataRandOrder<-data[,randindex]
V <- colnames( TrainDataRandOrder)
UD <-data.frame(TrainDataRandOrder)
numberUnique_val_col= sapply(UD,function(x)length(unique(x)))
suffStat <- list(dm = TrainDataRandOrder,nlev = c(numberUnique_val_col[1],numberUnique_val_col[2], numberUnique_val_col[3],numberUnique_val_col[4],
numberUnique_val_col[5],numberUnique_val_col[6], numberUnique_val_col[7],
numberUnique_val_col[8],numberUnique_val_col[9],
numberUnique_val_col[10],numberUnique_val_col[11],
numberUnique_val_col[12],numberUnique_val_col[13],
numberUnique_val_col[14],numberUnique_val_col[15],
numberUnique_val_col[16],numberUnique_val_col[17],
numberUnique_val_col[18],numberUnique_val_col[19], numberUnique_val_col[20]), adaptDF = FALSE)
pc.fit <- pc(suffStat, indepTest= disCItest, alpha=0.01, labels=V, fixedGaps = NULL, fixedEdges = NULL,NAdelete = TRUE, m.max = Inf,skel.method = "stable", conservative = TRUE,solve.confl = TRUE, verbose = TRUE)
The "Stable" part of PC-Stable only affects the Skeleton phase of the algorithm. The Orientation phase is still order-dependent. Do the two graphs have identical "skeletons"? That is, if you convert all directed edges into undirected edges, are the two graphs identical?
If not, you may have uncovered a bug in pcalg! Please post a sample dataset and two orderings of the columns that produce graphs with different skeletons.

Correct syntax for using custom methods with caret package

I get the following error in caret trainControl() using the custom methods syntax documented in the package vignette (pdf) on page 46. Does anyone know if this document out of date or incorrect? It seems at odds with the caret documentation page where the "custom" parameter is not used.
> fitControl <- trainControl(custom=list(parameters=lpgrnn$grid, model=lpgrnn$fit,
prediction=lpgrnn$predict, probability=NULL,
sort=lpgrnn$sort, method="cv"),
number=10)
Error in trainControl(custom = list(parameters = lpgrnn$grid, model = lpgrnn$fit, :
unused argument (custom = list(parameters = lpgrnn$grid, model = lpgrnn$fit,
prediction = lpgrnn$predict, probability = NULL, sort = lpgrnn$sort, method = "cv",
number = 10))
The cited pdf is out of date. The caret website is the canonical source of documentation.

Resources