Sckopt library gives error while tryin to use BayesSearchCV - machine-learning

Here is my code :
from skopt import BayesSearchCV
import warnings
warnings.filterwarnings('ignore', message='The objective has been evaluated at this point before.')
params={'min_child_weight': (0, 50,),
'max_depth': (0, 10),
'subsample': (0.5, 1.0),
'colsample_bytree': (0.5, 1.0),
#'reg_lambda':(1e-5,100,'log-uniform'),
#'reg_alpha':(1e-5,100,'log-uniform'),
'learning-rate':(0.01,0.2,'log-uniform')
}
bayes=BayesSearchCV(xgb.XGBRegressor(),params,n_iter=10,scoring='neg_mean_squared_error',cv=5,random_state=42)
res=bayes.fit(X_train,y_train)
print(res.best_params_)
My program only runs if i comment this three params, however if i leave them active
it give me error:
ValueError: Not all points are within the bounds of the space.
How could I still include this params, and what is the reason why it denies to accept it ?

Related

how to update bound of PositionConstraint?

We want speed up the IK solve porocess, so we want to update the bound of IK constraints, instead of creating constraints every time solve IK problems.
We noticed that there are APIs like set_bounds() to update bounds in BoundingBoxConstraint, LinearConstraint, etc. But the bound of PositionConstraint cann't be updated this way.
Is this discouraged, or there are some way to update bounds of PositionConstraint?
Thanks!
The PR #16631 was just merged into Drake. Now you can use set_bounds, UpdateLowerBound and UpdateUpperBound in PositionConstraint.
PositionConstraint.set_bounds documentation
constraint = ik.PositionConstraint(
plant=variables.plant,
frameA=variables.body1_frame,
p_AQ_lower=[-0.1, -0.2, -0.3],
p_AQ_upper=[-0.05, -0.12, -0.28],
frameB=variables.body2_frame,
p_BQ=[0.2, 0.3, 0.5], plant_context=variables.plant_context)
constraint.UpdateLowerBound(new_lb=np.array([-2, -3, -0.5]))
constraint.UpdateUpperBound(new_ub=np.array([10., 0.5, 2.]))
constraint.set_bounds(new_lb=[-1, -2, -2.], new_ub=[1., 2., 3.])

Constructing discrete table-based CPDs in tensorflow-probablity?

I'm trying to construct the simplest example of Bayesian network with several discrete random variables and conditional probabilities (the "Student Network" from Koller's book, see 1)
Although a bit unwieldy, I managed to build this network using pymc3. Especially, creating the CPDs is not that straightforward in pymc3, see the snippet below:
import pymc3 as pm
...
with pm.Model() as basic_model:
# parameters for categorical are indexed as [0, 1, 2, ...]
difficulty = pm.Categorical(name='difficulty', p=[0.6, 0.4])
intelligence = pm.Categorical(name='intelligence', p=[0.7, 0.3])
grade = pm.Categorical(name='grade',
p=pm.math.switch(
theano.tensor.eq(intelligence, 0),
pm.math.switch(
theano.tensor.eq(difficulty, 0),
[0.3, 0.4, 0.3], # I=0, D=0
[0.05, 0.25, 0.7] # I=0, D=1
),
pm.math.switch(
theano.tensor.eq(difficulty, 0),
[0.9, 0.08, 0.02], # I=1, D=0
[0.5, 0.3, 0.2] # I=1, D=1
)
)
)
letter = pm.Categorical(name='letter', p=pm.math.switch(
...
But I have no idea how to build this network using tensoflow-probability (versions: tfp-nightly==0.7.0.dev20190517, tf-nightly-2.0-preview==2.0.0.dev20190517)
For the unconditioned binary variables, one can use categorical distribution, such as
from tensorflow_probability import distributions as tfd
from tensorflow_probability import edward2 as ed
difficulty = ed.RandomVariable(
tfd.Categorical(
probs=[0.6, 0.4],
name='difficulty'
)
)
But how to construct the CPDs?
There are few classes/methods in tensorflow-probability that might be relevant (in tensorflow_probability/python/distributions/deterministic.py or the deprecated ConditionalDistribution) but the documentation is rather sparse (one needs deep understanding of tfp).
--- Updated question ---
Chris' answer is a good starting point. However, things are still a bit unclear even for a very simple two-variable model.
This works nicely:
jdn = tfd.JointDistributionNamed(dict(
dist_x=tfd.Categorical([0.2, 0.8], validate_args=True),
dist_y=lambda dist_x: tfd.Bernoulli(probs=tf.gather([0.1, 0.9], indices=dist_x), validate_args=True)
))
print(jdn.sample(10))
but this one fails
jdn = tfd.JointDistributionNamed(dict(
dist_x=tfd.Categorical([0.2, 0.8], validate_args=True),
dist_y=lambda dist_x: tfd.Categorical(probs=tf.gather_nd([[0.1, 0.9], [0.5, 0.5]], indices=[dist_x]))
))
print(jdn.sample(10))
(I'm trying to model categorical explicitly in the second example just for learning purposes)
-- Update: solved ---
Obviously, the last example wrongly used tf.gather_nd instead of tf.gather as we only wanted to select the first or the second row based on the dist_x outome. This code works now:
jdn = tfd.JointDistributionNamed(dict(
dist_x=tfd.Categorical([0.2, 0.8], validate_args=True),
dist_y=lambda dist_x: tfd.Categorical(probs=tf.gather([[0.1, 0.9], [0.5, 0.5]], indices=[dist_x]))
))
print(jdn.sample(10))
The tricky thing about this, and presumably the reason it's subtler than expected in PyMC, is -- as with almost everything in vectorized programming -- handling shapes.
In TF/TFP, the (IMO) nicest way to solve this is with one of the new TFP JointDistribution{Sequential,Named,Coroutine} classes. These let you naturally represent hierarchical PGM models, and then sample from them, evaluate log probs, etc.
I whipped up a colab notebook demoing all 3 approaches, for the full student network: https://colab.research.google.com/drive/1D2VZ3OE6tp5pHTsnOAf_7nZZZ74GTeex
Note the crucial use of tf.gather and tf.gather_nd to manage the vectorization of the various binary and categorical switching.
Have a look and let me know if you have any questions!

most cv2 tuple arguments don't work in python 3.5 (windows 7)

I switched a project over from python 2.7 to 3.5 and now I can't use most functions that require tuples.
As an example:
rgb = (255,0,0)
cv2.circle(img,(x, y),2,rgb,-1)
will return the "new style getargs format but argument is not a tuple" system error.
No matter how I enter the tuple expressing the color of the circle it will always fail, even if I explicitly use "tuple()"
I realize this problem isn't new but the solutions available are package dependent (https://mail.python.org/pipermail/python-dev/2017-January/147091.html).
I just want to put dots on an image without having to bring in another library when the same script is already using opencv.
EDIT: it's complaining about the x,y. The reason this worked in 2.7 and not 3.5 are unclear but specifically declaring the value as a tuple() fixes the issue
In Python 3.5, the following works:
import numpy as np
import cv2
img = np.zeros((100, 100)) # Black image
rgb = (255, 0, 0)
cv2.circle(img, (50, 50), 2, rgb, -1) # Plot centered on (50, 50)
Could you try this on your system and see if the error persist?
(Using Windows 10, Anaconda3, OpenCV 3.1.0)
If this works, your issue may be related to the type of your variable 'img' or the type of its values (see the link I posted as a comment of your question)

TensorFlow Classification Using Dataset

I need to utilize TensorFlow for a project to classify items based on their attributes to a certain class (either 1, 2, or 3).
Only problem is almost every TF tutorial or example I find online is about image recognition or text classification. I can't find anything about classification based on numbers. I guess what I'm asking for is where to get started. If anyone knows of a relevant example, or if I'm just thinking about this completely wrong.
We are given the 13 attributes for each item, and need to use the TF neural network to classify each item correctly (or mark the margin of error). But nothing online is showing me even how to start with this kind of dataset.
Example of dataset: (first value is class, other values are attributes)
2, 11.84, 2.89, 2.23, 18, 112, 1.72, 1.32, 0.43, 0.95, 2.65, 0.96, 2.52, 500
3, 13.69, 3.26, 2.54, 20, 107, 1.83, 0.56, 0.5, 0.8, 5.88, 0.96, 1.82, 680
3, 13.84, 4.12, 2.38, 19.5, 89, 1.8, 0.83, 0.48, 1.56, 9.01, 0.57, 1.64, 480
2, 11.56, 2.05, 3.23, 28.5, 119, 3.18, 5.08, 0.47, 1.87, 6, 0.93, 3.69, 465
1, 14.06, 1.63, 2.28, 16, 126, 3, 3.17, 0.24, 2.1, 5.65, 1.09, 3.71, 780
Suppose you have the data in a file, data.txt. You can use Numpy to read this:
import numpy as np
xy = np.loadtxt('data.txt', unpack=True, dtype='float32')
x_data = xy[1:]
y_data = xy[0];
More information: http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.loadtxt.html
Perhaps, you may need 'np.transpose' depends on the shape of your weights and operations.
x_data = np.transpose(xy[1:])
Then, use 'placeholders' and 'feed_dict' to train/test your model:
X = tf.placeholder("float", ...
Y = tf.placeholder("float", ...
....
with tf.Session() as sess:
....
sess.run(optimizer, feed_dict={X:x_data, Y:y_data})
for this kind problem TensorFlow have an in depth tutorial here
or in toward data science here
if your looking for videos to start i think sentdex's tutorials on the titanic data-set
is what your looking for although he is using k means to do the classification
(actually I think his entire deep learning/machine learning playlist is great to start with)
you can find it here
otherwise if your looking for basic how to start
first prepossessing:
try first separating the data into class labels and inputs (pandas lib should be able to help you with this)
make your class labels into a one-hot array
than normalize the data:
it looks like your different data attributes have wildly different ranges, make sure to get them all in the same range between 0 and 1
build your model:
a simple fully connected net should do the trick
remember to make the output layer the same size as the number of classes you have
use an argmax function on the output of the finale layer to decide which class the model thinks is the proper classification

Can I have the shape information of the shared variable in theano?

It seems that variable.shape would notify me that
AttributeError: 'SharedVariable' object has no attribute 'shape'
while theano.tensor.shape(variable) will return me a shape.0
I am really confused why can't I get a shape information on that? The same problem occurs when I want to get the shape information of a symbolic variable. It is just so weird.
x = T.matrix('x') # the data is presented as rasterized images
y = T.ivector('y') # the labels are presented as 1D vector of
# [int] labels
layer0_input = x.reshape((batch_size, 1, 28, 28))
In the example above, the x (symbolic variable) has been reshaped to some shape, if would not make sense to me if I can't retrieve its shape information while could still assigning it new shape.
The first error is probably due to the fact that you tried to evaluate the shape property on the data type SharedVariable, not on an actual shared variable.
Otherwise, obtaining shape.0 is completely normal: This is a symbolic expression representing the shape, which is a priori unknown. As soon as you evaluate with data, you will see the shape:
import theano
import theano.tensor as T
import numpy as np
s = theano.shared(np.arange(2 * 3 * 5).reshape(2, 3, 5))
print(s.shape) # gives you shape.0
print(s.shape.eval()) # gives you an array containing 2, 3, 5
a = T.tensor3()
print(a.shape) # gives you shape.0
print(a.shape.eval({a: np.arange(2 * 3 * 5).reshape(2, 3, 5).astype(theano.config.floatX)})) # gives 2, 3, 5

Resources