I have an image size of WxHx3 that needs to segment into 21 classes. After passing through some layers using CNN, I obtained the W/4 x H/4 x 512 feature maps. We will use a Convolutional and Deconvolutional layer finally to compare it with its label in softmax layer. I have 2 topologies:
Softmax_loss Softmax_loss
^ ^
| |
deconv_layer conv_layer
^ ^
| |
conv_layer deconv_layer
^ ^
| |
Intermediate_layers Intermediate_layers
^ ^
| |
Input Input
(1) (2)
Which topology is better? I have seen both of them. No.1 in FCN (fully convolution network for semantic segmentation), No.2 in VoxResNet, UNet.
There is no correct way in deep learning. You usually try things and which ever works best for you. If you follow fcn model it starts like conv->deconv->conv->deconv. Most cited papers in semantic segmentation do conv->deconv. So I would suggest No. 1 too. Also intuitively, doing deconv first doesn't make much sense.
Related
The SpatialInertia object in Drake has a method called CopyToFullMatrix6(), which outputs the 6x6 representation of the SpatialInertia object. Is there a function to do the opposite? I.e. I have a 6x6 matrix that I want to make into a SpatialInertia object. (I am using Pydrake locally in Ubuntu 18.04.)
The context is: I'm working with some hydrodynamic modeling. Due to the model of the added mass, the "mass" submatrix of my Spatial Inertia matrix, which typically has three identical mass values on the diagonal, like so:
---------
| m 0 0 |
| 0 m 0 |
| 0 0 m |
---------
actually has different "masses" in the different directions, like so:
------------
| m1 0 0 |
| 0 m2 0 |
| 0 0 m3 |
------------
The consequence of this is that the usual constructors (MakeFromCentralInertia() or just SpatialInertia()) won't work because they take one mass value as an input.
My process right now is to query a body's SpatialInertia, get the 6x6 reprentation, and add the Added Mass Spatial Inertia matrix to it as a numpy array. Now I need a way to make that matrix into a SpatialInertia again so I can apply it back to the body.
Any thoughts on the process or the conversion are appreciated. Thanks!
A SpatialInertia can only have a scalar mass (it is actually represented that way, not as a 6x6 matrix). That is, that class is inherently the mass properties of a rigid body and can't be used for something more general. The multibody system does have a more general class, ArticulatedBodyInertia which does represent masses that appear different in different directions due to joint articulation. It is not clear to me how that could be used for hydrodynamics, but at least it can represent the varying masses.
I have the following bayes net with me.
I want to find P(+h|+e). So I have to find A = P(+h,+e) and B = P(+e) to find P(+h|+e). I wanted to follow variable elimination for find the probability. Taking different orders is giving me different probabilities. How should I choose my order of the variable elimination for accurate calculation of P(+h|+e)?
Will it be okay if I calculate P(+h,+u,+e) and eliminating +u instead of finding P(+i, +h, +t, +u, +e) and eliminating +i,+t and +u for finding P(+h,+e)?
How do I calculate P(+e)?
1.P(h|e) is the conditional probability of P(cause | effect ),we are using an effect to infer the cause (diagnostic direction).
P(c| e)P(e) /P(c) = P(h| e)P(e)/P(h) = P(h,e)P(e)/P(h)
So to calculate P(h,e) you would have to calculate joint distribution with all the variables and marginalise each one since they are relevant to the query and evidence variables.
P(+i, +h, +t, +u, +e) would be the correct choice
To calculate P(+e) we would need only its parents, i.e Good test taker and understands the material. So we need to calculate the underlying conditional distribution P(e| t,u) and marginalizing out the variables t, u.
P(+e)
= Sum_t( Sum_u( P(+e, t, u)))
= P( +e | +t,+u)P(+t)P(+u) + P( +e | +t,-u)P(+t)P(-u) + P( +e | -t,+u)P(-t)P(+u)+ P( +e | -t,-u)P(-t)P(-u)
I am working on a binary classification problem and using SparkML, I trained and evaluated my data using Random Forest and Logistic Regression models and now I wanted to check how well SVM classifies my data.
Snippet of my training data:-
+----------+------+
| spam | count|
+----------+------+
| No|197378|
| Yes| 7652|
+----------+------+
Note:- My dependent variable: 'spam': string (nullable = true)
+-----+------+
|label| count|
+-----+------+
| 0.0|197488|
| 1.0| 7650|
+-----+------+
Note:- label: double (nullable = false)
Updates to my question:-
trainingData.select('label').distinct().show()
+-----+
|label|
+-----+
| 0.0|
| 1.0|
+-----+
So, I used below code to fit my training data using Linear SVC:-
pyspark.ml.classification import LinearSVC
lsvc = LinearSVC()
# Fit the model
lsvcModel = lsvc.fit(trainingData)
In my data frame, label and dependent variable have only 2 classes, but I get an error saying more classes are detected. Not really sure what's causing this exception.
Any help is much appreciated!
Error:-
IllegalArgumentException: u'requirement failed: LinearSVC only supports
binary classification. 3 classes detected in
LinearSVC_4240bb949b9fad486ec0__labelCol'
you can try to convert your label value into a categorical data using OnehotEncoder with handleInvalid parameter to be "keep"
I have same problem.
scala> TEST_DF_37849c70_7cd3_4fd6_a9a0_df4de727df25.select("si_37849c70_7cd3_4fd6_a9a0_df4de727df25_logicProp1_lable_left").distinct.show
+-------------------------------------------------------------+
|si_37849c70_7cd3_4fd6_a9a0_df4de727df25_logicProp1_lable_left|
+-------------------------------------------------------------+
| 0.0|
| 1.0|
+-------------------------------------------------------------+
error: requirement failed: LinearSVC only supports binary classification. 3 classes detected in linearsvc_d18a38204551__labelCol
but in my case, Using StringIndexer with setHandleInvalid("skip") option, It works.
Maybe LeanerSVM have some bug in case of StringIndexer "keep" option.
I have a question on machine learning training data.
Is there a way to structure the data so that the algorithm learns to make associations between data points. For example, if I hypothetically wanted to train the algorithm on what cats eat, how could I structure the training data so that the algorithm learns to associate cats with the food they eat?
Thank you.
Seems like you're beginning to study machine learning. Let's expand on your example. There are two questions I think you might be asking here
(1) How do I figure out what cats like to eat?
(2) How can I predict what a cat will eat if I know some extra facts about that cat? How do I structure data to accomplish this?
(1) This is the interpretation that Thomas Pinetz is referring to. To answer the question "what do cats like to eat" you don't need what most people would consider "machine learning." You can carry out a survey, and then use a test for statistical association. But I don' think that's what your asking for here...
(2) This is machine learning. This is not just about structuring data. Note that everything below is wildly oversimplified. Training data for machine learning is usually structured in terms of "instances." Suppose you have two kinds of food ("kibbles", and "tuna") and consider this example:
Cat / Features | eye-color | coat-color | ear-length-cm | **food** |
---- | ---- | ---- | ---- | --- |
Socks | "green" | "brown" | 3.0 | "kibbles" |
Jimmy | "blue" | "gray" | 3.7 | "tuna" |
Snowball | "green" | "white" | 2.9 | "kibbles" |
MrTumnus | "blue" | NA | 3.1 | "tuna" |
Tosca | "blue" | "orange" | 3.2 | "kibbles" |
... | ... | ... | ... | ... |
(One would hope for a bigger training set than this...)
Each row of the above is an "instance." The three middle columns are features, facts about each cat. The last column is the food that the cat in question likes to eat, usually called the "class label." The first column is the name of the cat, which I made up for fun. It's useless information, but it lets us refer to our instances more easily here.
Your goal in this case would be to use the middle three columns, your features, to predict the class label. Data structured like this is a common starting point for a machine learning problem.
Now when you chose a way of attacking the problem, you'll be faced with some extra issues:
(1) The MrTumnus instance has missing data, his "coat-color" is NA.
(2) You have both continuous (ear-length-cm) and discrete (eye-color, coat-color) features, depending on the algorithms you throw at this problem, using both kinds of data may be difficult.
Let's suppose you only consider your discrete features (eye-color, and coat-color). Some machine learning algorithm we can imagine might take this data, and compute probabilities like these:
P(eye-color = "green", food = "kibbles")
P(coat-color = "white", food = "tuna")
P(coat-color = "white", eye-color="blue", food="tuna")
etc. you can see where this is going.
This then gives us a model CatFood(eye-color, coat-color) which can return the food that the cat is most likely to enjoy, given eye-color, and coat-color. More questions: what if an eye-color or coat-color is supplied that we haven't seen before? Scratching the surface.
Then, when you have a new cat in front of you, and you want to find out what it might like to eat, based on its eye-color and coat-color, you would collect the data you need, and apply your model. Here's your new instance:
Cat / Features | eye-color | coat-color | **food** |
---- | ---- | ---- | --- |
Oswald | "blue" | "orange" | ? |
Suppose now we apply our model, CatFood("blue", "organge"). It goes back to the probabilities that were computed on our training data, and will tell us what food, according to the model, the cat is most likely to want to eat.
This is pure statistics. Machine Learning is the art of making predictions about the future of things. If you wanted to predict what your cat is eating next than you could apply a machine learning algorithm.
But you want to discover a correspondence between what cats eat. If you really want to know this you should ask a couple people with cats on what they are eating and use some statistical model to find the mean.
I have no clue about data mining or data analysis or statistical analysis but I think what I need is finding "clusters in a matrix". I have a data set of ~20k records and each has ~40 characteristics all of which are either turned on or off.
+--------+------+------+------+------+------+------+
| record | hasA | hasB | hasC | hasD | hasE | hasF |
+--------+------+------+------+------+------+------+
| foo | 1 | 0 | 1 | 0 | 0 | 0 |
| bar | 1 | 1 | 0 | 0 | 1 | 1 |
| baz | 1 | 1 | 1 | 0 | 0 | 0 |
+--------+------+------+------+------+------+------+
I'm quite convinced most of those 20k records have characteristics that fall into one of several categories. There must be means to determine how similar record 'foo' is to record 'bar'.
So, what is it that I'm actually looking at? What algorithm am I looking for?
Transform each record r into a binary vector v(r) so that i-th component of v(r) is set to 1 if r has i-th characteristic, and 0 otherwise.
Now run hierarchical clustering algorithm on this set of vectors under the Hamming distance or Jaccard distance, whichever you think is more appropriate; also make sure there's a notion of distance between clusters defined in terms of the underlying distance (see linkage criteria).
Then decide where to cut the resulting dendrogram based on common sense. Where you cut the dendrogram will affect the number of clusters.
One downside of hierarchical clustering is that it's rather slow. It takes O(n^3) time in general, so it would take quite a while on a large data set. For single- and complete-linkages you can bring the time down to O(n^2).
Hierarchical clustering is very easy to implement in languages such as Python. You can also use the implementation from the scipy library.
Example: Hierarchical Clustering in Python
Here's a code snippet to get you started. I assume S is the set of records transformed into binary vectors (i.e. each list in S corresponds to a record from your data set).
import numpy as np
import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt
# This is the set of binary vectors, each of which would
# correspond to a record in your case.
S = [
[0, 0, 0, 1, 1], # 0
[0, 0, 0, 0, 1], # 1
[0, 0, 0, 1, 0], # 2
[1, 1, 1, 0, 0], # 3
[1, 0, 1, 0, 0], # 4
[0, 1, 1, 0, 0]] # 5
# Use Hamming distance with complete linkage.
Z = sch.linkage(sch.distance.pdist(S, metric='hamming'), 'complete')
# Compute the dendrogram
P = sch.dendrogram(Z)
plt.show()
The result is as you'd expect: cut at 0.5 to get two clusters, one of the first three vectors (which have ones at beginning, zeros at the end) and the other of the last three vectors (which have ones at the end, zeros at the beginning). Here's the image:
Hierarchical clustering starts with each vector being its own cluster. In each successive steps it merges the closest clusters. It repeats this until there is a single cluster left.
The dendrogram essentially encodes the whole clustering process. At the beginning each vector is its own cluster. Then {3} and {5} merge into {3,5} and {0} and {2} merge into {0,2}. Next, {4} and {3,5} merge into {3,4,5}, and {1} and {0,2} merge into {0,1,2}. Finally, {0,1,2} and {3,4,5} merge into {0,1,2,3,4,5}.
From the dendrogram you can usually see at which point it makes the most sense to cut---this will define your clusters.
I encourage you to experiment with various distances (e.g. Hamming distance, Jaccard distance) and linkages (e.g. single linkage, complete linkage), and various representations (e.g. binary vectors).
Are you sure you want cluster analysis?
To find similar records you don't need cluster analysis. Simply find similar records with any distance measure such as Jaccard similarity or Hamming distance (both of which are for binary data). Or cosine distance, so that you can use e.g. Lucene to find similar records fast.
To find common patterns, the use of frequent itemset mining may yield much more meaningful results, because these can work on a subset of attributes only. For example, in a supermarket, the columns Noodles, Tomato, Basil, Cheese may constitute a frequent pattern.
Most clustering algorithms attempt to divide the data into k groups. While this at first appears a good idea (get k target groups) it rarely matches what real data contains. For example customers: why would every customer belong to exactly one audience? What if the audiences are e.g. car lovers, gun lovers, football lovers, soccer moms - are you sure you don't want to allow overlap of these groups?
Furhermore, a problem with cluster analysis is that it's incredibly easy to use badly. It does not "fail hard" - you always get a result, and you might not realize that it's a bad result...