Seaborn FacetGrid: while mapping a stripplot dodge not implemented - keyword

Using Seaborn, I'm trying to generate a factorplot with each subplot showing a stripplot. In the stripplot, I'd like to control a few aspects of the markers.
Here is the first method I tried:
import seaborn as sns
tips = sns.load_dataset("tips")
g = sns.FacetGrid(tips, col="time", hue="smoker")
g = g.map(sns.stripplot, 'day', "tip", edgecolor="black",
linewideth=1, dodge=True, jitter=True, size=10)
And produced the following output without dodge
While most of the keywords were implemented, the hue wasn't dodged.
I was successful with another approach:
kws = dict(s=10, linewidth=1, edgecolor="black")
tips = sns.load_dataset("tips")
sns.factorplot(x='day', y='tip', hue='smoker', col='time', data=tips,
kind='strip',jitter=True, dodge=True, **kws, legend=False)
This gives the correct output:
In this output, the hue is dodged.
My question is: why did g.map(sns.stripplot...) not dodge the hue?

The hue parameter would need to be mapped to the sns.stripplot function via the g.map, instead of being set as hue to the Facetgrid.
import seaborn as sns
tips = sns.load_dataset("tips")
g = sns.FacetGrid(tips, col="time")
g = g.map(sns.stripplot, 'day', "tip", "smoker", edgecolor="black",
linewidth=1, dodge=True, jitter=True, size=10)
This is because map calls sns.stripplot individually for each value in the time column, and, if hue is specified for the complete Facetgrid, for each hue value, such that dodge would loose its meaning on each individual call.
I can agree that this behaviour is not very intuitive unless you look at the source code of map itself.
Note that the above solution causes a Warning:
lib\site-packages\seaborn\categorical.py:1166: FutureWarning:elementwise comparison failed;
returning scalar instead, but in the future will perform elementwise comparison
hue_mask = self.plot_hues[i] == hue_level
I honestly don't know what this is telling us; but it seems not to corrupt the solution for now.

Related

Constraints for Direct Collocation method in pydrake

I am looking for a way to describe the constraints of the Direct Collocation method in pydrake.
I got the robot model from my own URDF by using FindResource as this(l11-16).
Then, I tried to make some functions which calculate the positions of the joints as swing_foot_height(q) of this.
However there is a problem.
It is maybe a type error.
I defined q as following
robot = MultibodyPlant(time_step=0.0)
scene_graph = SceneGraph()
robot.RegisterAsSourceForSceneGraph(scene_graph)
file_name = FindResource("models/robot.urdf")
Parser(robot).AddModelFromFile(file_name)
robot.Finalize()
context = robot.CreateDefaultContext()
dircol = DirectCollocation(
robot,
context,
...(Omission)...
input_port_index=robot.get_actuation_input_port().get_index())
x = dircol.state()
nq = biped_robot.num_positions()
q = x[0:nq]
Then, I used this q for the function like swing_foot_height(q).
The error is like
SetPositions(): incompatible function arguments. The following argument types are supported:
...
q: numpy.ndarray[numpy.float64[m, 1]]
...
Invoked with:
...
array([Variable('x(0)', Continuous), ... Variable('x(9)', Continuous)],dtype=object)
Are there some way to avoid this error?
Right. In the compass gait notebook that you cited, there was an important line:
# overwrite MultibodyPlant with its autodiff copy
compass_gait = compass_gait.ToAutoDiffXd()
so that multibody plant that was being used in the constraint is actually an AutoDiffXd version of the plant.
The littledog notebook has more examples of this, with a more robust implementation that works for both float and autodiff constraint evaluations.
As far as I understand this, trajectory optimization with DirectCollocation converts the data type of the decision variables (in your case, x and q) to AutoDiffXd type. That is the type you're seeing here in the "Invoked with" error message. This is the type used for automatic differentiation which is used for finding the gradients for the optimization solver.
You'll need to convert back to float to use the SetPositions() function.

Error when trying to get tidy summaries with broomstick for multiple randomForest models

I'm grouping my data frame and fitting each group's data with a random forest model, and then using broomstick to get tidy outputs for each group's model. I'm running into trouble when I get to tidy and augment.
I can group the data and fit the models.
library(tidyverse)
library(broomstick)
library(randomForest)
data<-data.frame(y=rep(rep(c(1,0),each=100),5),
group=rep(c("A","B","C","D","E"), each=200),
x1=rnorm(2000),
x2=rnorm(2000),
x3=rnorm(2000),
x4=rnorm(2000),
x5=rnorm(2000))
GroupModels<-data%>%
nest(data= -group)%>%
mutate(fit = map(data, ~ randomForest(y ~ ., ntree=101, mtry=2, data = .x, importance=TRUE)))
I then map glance to the fitted models and that works. I get mse and rsq for each group.
GroupModels%>%
mutate(glanced = map(fit, glance))%>%
unnest(glanced)%>%
select(-data, -fit)%>%
as.data.frame()
If I map tidy to the fitted models I get an output and a deprecation warning and I don't understand where tibble::as_tibble() should come into play.
GroupModels%>%
mutate(tidied = map(fit, tidy))%>%
unnest(tidied)%>%
select(-data, -fit)%>%
as.data.frame()
1: Problem with mutate() column tidied. i tidied = map(fit, tidy). i This function is deprecated as of broom 0.7.0 and will be
removed from a future release. Please see tibble::as_tibble().
If I map augment to the models I get an error and I'm not sure what to do with that.
GroupModels%>%
mutate(augmented = map(fit, augment))%>%
unnest(augmented)%>%
select(-data, -fit)%>%
as.data.frame()
Error: Problem with mutate() column augmented. i augmented = map(fit, augment). x argument must be coercible to non-negative
integer
I can now get augment to work using "map2", didn't know about this, but it's handy when you need both the fit and the data for a function. I guess I'll worry about the deprecation warning when it happens.
GroupModels%>%
mutate(augmented = map2(fit, data, augment))%>%
unnest(augmented)%>%
select(-data, -fit)%>%
as.data.frame()

Provide custom gradient to drake::MathematicalProgram

Drake has an interface where you can give it a generic function as a constraint and it can set up the nonlinearly-constrained mathematical program automatically (as long as it supports AutoDiff). I have a situation where my constraint does not support AutoDiff (the constraint function conducts a line search to approximate the maximum value of some function), but I have a closed-form expression for the gradient of the constraint. In my case, the math works out so that it's difficult to find a point on this function, but once you have that point it's easy to linearize around it.
I know many optimization libraries will allow you to provide your own analytical gradient when available; can you do this with Drake's MathematicalProgram as well? I could not find mention of it in the MathematicalProgram class documentation.
Any help is appreciated!
It's definitely possible, but I admit we haven't provided helper functions that make it pretty yet. Please let me know if/how this helps; I will plan to tidy it up and add it as an example or code snippet that we can reference in drake.
Consider the following code:
from pydrake.all import AutoDiffXd, MathematicalProgram, Solve
prog = MathematicalProgram()
x = prog.NewContinuousVariables(1, 'x')
def cost(x):
return (x[0]-1.)*(x[0]-1.)
def constraint(x):
if isinstance(x[0], AutoDiffXd):
print(x[0].value())
print(x[0].derivatives())
return x
cost_binding = prog.AddCost(cost, vars=x)
constraint_binding = prog.AddConstraint(
constraint, lb=[0.], ub=[2.], vars=x)
result = Solve(prog)
When we register the cost or constraint with MathematicalProgram in this way, we are allowing that it can get called with either x being a float, or x being an AutoDiffXd -- which is simply a wrapping of Eigen's AutoDiffScalar (with dynamically allocated derivatives of type double). The snippet above shows you roughly how it works -- every scalar value has a vector of (partial) derivatives associated with it. On entry to the function, you are passed x with the derivatives of x set to dx/dx (which will be 1 or zero).
Your job is to return a value, call it y, with the value set to the value of your cost/constraint, and the derivatives set to dy/dx. Normally, all of this happens magically for you. But it sounds like you get to do it yourself.
Here's a very simple code snippet that, I hope, gets you started:
from pydrake.all import AutoDiffXd, MathematicalProgram, Solve
prog = MathematicalProgram()
x = prog.NewContinuousVariables(1, 'x')
def cost(x):
return (x[0]-1.)*(x[0]-1.)
def constraint(x):
if isinstance(x[0], AutoDiffXd):
y = AutoDiffXd(2*x[0].value(), 2*x[0].derivatives())
return [y]
return 2*x
cost_binding = prog.AddCost(cost, vars=x)
constraint_binding = prog.AddConstraint(
constraint, lb=[0.], ub=[2.], vars=x)
result = Solve(prog)
Let me know?

Scikit-learn: How to extract features from the text?

Assume I have an array of Strings:
['Laptop Apple Macbook Air A1465, Core i7, 8Gb, 256Gb SSD, 15"Retina, MacOS' ... 'another device description']
I'd like to extract from this description features like:
item=Laptop
brand=Apple
model=Macbook Air A1465
cpu=Core i7
...
Should I prepare the pre-defined known features first? Like
brands = ['apple', 'dell', 'hp', 'asus', 'acer', 'lenovo']
cpu = ['core i3', 'core i5', 'core i7', 'intel pdc', 'core m', 'intel pentium', 'intel core duo']
I am not sure that I need to use CountVectorizer and TfidfVectorizer here, it's more appropriate to have DictVictorizer, but how can I make dicts with keys extracting values from the entire string?
is it possible with scikit-learn's Feature Extraction? Or should I make my own .fit(), and .transform() methods?
UPDATE:
#sergzach, please review if I understood you right:
data = ['Laptop Apple Macbook..', 'Laptop Dell Latitude...'...]
for d in data:
for brand in brands:
if brand in d:
# ok brand is found
for model in models:
if model in d:
# ok model is found
So creating N-loops per each feature? This might be working, but not sure if it is right and flexible.
Yes, something like the next.
Excuse me, probably you should correct the code below.
import re
data = ['Laptop Apple Macbook..', 'Laptop Dell Latitude...'...]
features = {
'brand': [r'apple', r'dell', r'hp', r'asus', r'acer', r'lenovo'],
'cpu': [r'core\s+i3', r'core\s+i5', r'core\s+i7', r'intel\s+pdc', r'core\s+m', r'intel\s+pentium', r'intel\s+core\s+duo']
# and other features
}
cat_data = [] # your categories which you should convert into numbers
not_found_columns = []
for line in data:
line_cats = {}
for col, features in features.iteritems():
for i, feature in enumerate(features):
found = False
if re.findall(feature, line.lower(), flags=re.UNICODE) != []:
line_cats[col] = i + 1 # found numeric category in column. For ex., for dell it's 2, for acer it's 5.
found = True
break # current category is determined by a first occurence
# cycle has been end but feature had not been found. Make column value as default not existing feature
if not found:
line_cats[col] = 0
not_found_columns.append((col, line))
cat_data.append(line_cats)
# now we have cat_data where each column is corresponding to a categorial (index+1) if a feature had been determined otherwise 0.
Now you have column names with lines (not_found_columns) which was not found. View them, probably you forgot some features.
We can also write strings (instead of numbers) as categories and then use DV. In result the approaches are equivalent.
Scikit Learn's vectorizers will convert an array of strings to an inverted index matrix (2d array, with a column for each found term/word). Each row (1st dimension) in the original array maps to a row in the output matrix. Each cell will hold a count or a weight, depending on which kind of vectorizer you use and its parameters.
I am not sure this is what you need, based on your code. Could you tell where you intend to use this features you are looking for? Do you intend to train a classifier? To what purpose?

IllegalArgumentException when using weka.clusterers.HierarchicalClusterer

I searched a lot, but I was not able to find any example code, which describes how to use the WEKA HierarchicalClusterer. Using the following C#-code gives me an IllegalArgumentException at "agg.buildClusterer(insts);".
weka.clusterers.HierarchicalClusterer agg = new weka.clusterers.HierarchicalClusterer();
agg.setNumClusters(NumCluster);
/*
Tag[] TAGS_LINK_TYPE = agg.getLinkType().getTags();
agg.setLinkType(new SelectedTag(1, TAGS_LINK_TYPE));
*/
agg.buildClusterer(insts);
for (int i = 0; i < insts.numInstances(); i++)
{
int clusterNumber = agg.clusterInstance(insts.instance(i));
}
The StackTrace says:
at java.util.PriorityQueue..ctor(Int32 initialCapacity, Comparator comparator)
at weka.clusterers.HierarchicalClusterer.doLinkClustering(Int32 , Vector[] , Node[] )
at weka.clusterers.HierarchicalClusterer.buildClusterer(Instances data)
but no Message or InnerException is specified.
The varaible "insts" is an Instances-object, which only holds instances with an equal amount of numerical attributes.
Is anyone able to quickly find my error or please post/link some example code?
Further, is the setting of the LinkType (commented code) correct?
Thanks,
Björn
The HierarchicalClusterer class has a TAGS_LINK_TYPE attribute. So like
agg.setLinkType(new SelectedTag(1, HierarchicalClusterer.TAGS_LINK_TYPE));
will achieve what you are after for setting the linking. Now what on earth does that 1 mean? From the javadocs we see what TAGS_LINK_TYPE contains:
-L Link type (Single, Complete, Average, Mean, Centroid, Ward, Adjusted complete, Neighbor Joining)
[SINGLE|COMPLETE|AVERAGE|MEAN|CENTROID|WARD|ADJCOMLPETE|NEIGHBOR_JOINING]
In general, your code looks ok for the C# case. I see you don't set the distance metric in your example above and maybe you would want to do this? I too use Weka as best I can with C# using IKVM. I have found the dataset allowed for hierarchical clustering is not too large. Maybe your dataset exceeds what WEKA can handled and you would avoid your error if you reduced the size of the dataset?

Resources