How to get the decision function from svm_model - machine-learning

Say I have a feature vector [v1,v2,v3],
then I have a decision function a*v1+b*v2+c*v3 =d
how do I get the values (a,b,c,d) using the inforrmation in svm_model?
I saw that these two fields in svm_model
public double[][] sv_coef;// coefficients for SVs in decision functions (sv_coef[k-1][l])
public double[] rho;// constants in decision functions (rho[k*(k-1)/2])
I suspect it could be essential for getting the decision function.

There is also a SVs field in svm_model. Your decision function is wv+b=0, where v = [v1,v2,v3]. Then,
w = SVs' * msv_coef;
b = -.rho;
For multi-class SVM, you may also need another field called Label
if Label(1) == -1
w = -w;
b = -b;
end
Check the FAQ part for more details.

Related

Approach for Linearizing Nonlinear System around Decision Variables (x, u) from MathematicalProgram at each discrete point in time

This is a follow up question on the same system from the following post (for additional context). I describe a nonlinear system as a LeafSystem_[T] (using templates) with two input ports and one output port. Then, I essentially would like to perform direct transcription using MathematicalProgram with an additional cost function that is dependent on the linearized dynamics at each time step (and hence linearized around the decision variables). I use two input ports as it seemed the most straightforward way for obtaining the linearized dynamics of the form from this paper on DIRTREL (if I can take the Jacobian with respect to input ports)
δxi+1 ≈ Aiδx + Biδu + Giw
where i is the timestep, x is the state, the first input port can encapsulate u, and the second input port can model w which may be disturbance, uncertainty, etc.
My main question is what would be the most suitable set of methods to obtain the linearized dynamics around the decision variables at each time step using automatic differentation? I was recommended trying automatic differentiation after attempting a symbolic approach in the previous post, but am not familiar with the setup for doing so. I have experimented with
using primitives.Linearize() (calling it twice, once for each input port) which feels rather clunky and I am not sure whether it is possible to pass in decision variables into context
perhaps converting my system into a multibody and making use of multibody.tree.JacobianWrtVariable()
or formatting my system dynamics so that I can pass them in as the function argument for forwarddiff.jacobian
but have met limited success.
The easiest way to get Ai, Bi is to instantiate your system with AutoDiffXd, namely LeafSystem<AutoDiffXd>. The following code will give you Ai, Bi
MyLeafSystem<AutoDiffXd> my_system;
Eigen::VectorXd x_val = ...
Eigen::VectorXd u_val = ...
Eigen::VectorXd w_val = ...
// xuw_val concantenate x_val, u_val and w_val
Eigen::VectorXd xuw_val(x_val.rows() + u_val.rows() + w_val.rows());
xuw_val.head(x_val.rows()) = x_val;
xuw_val.segment(x_val.rows(), u_val.rows()) = u_val;
xuw_val.segment(w_val.rows()) = w_val;
// xuw_autodiff stores xuw_val in its value(), and an identity matrix in its gradient()
AutoDiffVecXd xuw_autodiff = math::initializeAutoDiff(xuw_val);
AutoDiffVecXd x_autodiff = xuw_autodiff.head(x_val.rows());
AutoDiffVecXd u_autodiff = xuw_autodiff.segment(x_val.rows(), u_val.rows());
AutoDiffVecXd w_autodiff = xuw_autodiff.tail(u_val.rows());
// I suppose you have a function x[n+1] = dynamics(system, x[n], u[n], w[n]). This dynamics function could be a wrapper of CalcUnrestrictedUpdate function.
AutoDiffVecXd x_next_autodiff = dynamics(my_system, x_autodiff, u_autodiff, w_autodiff);
Eigen::MatrixXd x_next_gradient = math::autoDiffToGradientMatrix(x_next_autodiff);
Eigen::MatrixXd Ai = x_next_gradient.block(0, 0, x_val.rows(), x_val.rows());
Eigen::MatrixXd Bi = x_next_gradient.block(0, x_val.rows(), x_val.rows(), u_val.rows());
Eigen::MatrixXd Gi = x_next_gradient.block(0, x_val.rows() + u_val.rows(), x_val.rows(), w_val.rows());
So you get the value of Ai, Bi, Gi in the end.
If you need to write a cost function, you will need to create a subclass of solvers::Cost. Inside the Eval function of this derived class, you will implement your code to first compute Ai, Bi, Gi, and then integrate the Riccati equation.
But I think since your cost function depends on Ai, Bi, Gi, the gradient of your cost function will depend on the gradient of Ai, Bi, Gi. Currently we don't provide the function to compute the second order gradient of the dynamics.
How complicated is your dynamical system? Is it possible to write down the dynamics by hand? If so, there are some shortcuts we can do to generate the second order gradient of your dynamics.
#sherm or other Drake dynamics folks, it would be great to get your opinion on how to get the second order gradient (assuming Phil could confirm he does need the second order gradient.)
Sorry for my belated reply.
Since your dynamics can be written by hand, then I would suggest to create a templated function to compute Ai, Bi, Gi as
template <typename T>
void ComputeLinearizedDynamics(
const LeafSystem<T>& my_system,
const Eigen::Ref<const drake::VectorX<T>>& x,
const Eigen::Ref<const drake::VectorX<T>>& u,
drake::MatrixX<T>* Ai,
drake::MatrixX<T>* Bi,
drake::MatrixX<T>* Gi) const;
You will need to write down the matrix Ai, Bi, Gi by hand within this function. Then when you instantiate your LeafSystem with T=AutoDiffXd, this function will compute Ai, Bi, Gi with its gradient, given the state x, input u and disturbance w.
Then in the cost function, you could consider to create a sub-class of Cost class as
class MyCost {
public:
MyCost(const LeafSystem<AutoDiffXd>& my_system) : my_system_{&my_system} {}
protected:
void DoEval(const Eigen::Ref<const Eigen::VectorXd>& x_input, Eigen::VectorXd* y) const {
// The computation here is inefficient, as we need to cast
// x_input to Eigen vector of AutoDiffXd, and then call
// DoEval with AutoDiffXd version, and then convert the
// result back to double. But it is easy to implement.
const AutoDiffVecXd x_autodiff = math::initializeAutoDiff(x_input);
AutoDiffVecXd y_autodiff;
this->DoEval(x_autodiff, &y_autodiff);
*y = math::autodiffToValueMatrix(y_autodiff);
}
void DoEval(const Eigen::Ref<const drake::AutoDiffVecXd>& x_input, drake::AutoDiffVecXd* y) const {
// x_input here contains all the state and control sequence The authors need to first partition x_input into x, u
drake::VectorX<T> x_all = x_input.head(num_x_ * nT_);
drake::VectorX<T> u_all = x_input.tail(num_u_ * nT_);
y->resize(1);
y(0) = 0;
// I assume S_final_ is stored in this class.
Eigen::MatrixXd S = S_final_;
for (int i = nT-1; i >= 0; --i) {
drake::MatrixX<AutoDiffXd> Ai, Bi, Gi;
ComputeLinearizedDynamics(
*my_system_,
x_all.segment(num_x_ * i, num_x_),
u_all.segment(num_u_ * i, num_u_),
&Ai, &B_i, &Gi);
S = Ai.T*S + S*Ai + ... // This is the Riccati equation.
// Now compute your cost with this S
...
}
}
void DoEval(const Eigen::Ref<const VectorX<symbolic::Variable>& x, VectorX<symbolic::Expression>* y) const {
// You don't need the symbolic version of the cost in nonlinear optimization.
throw std::runtime_error("Not implemented yet");
}
private:
LeafSystem<AutoDiffXd>* my_system_;
};
The DoEval function with autodiff version will compute the gradient of the cost for you automatically. Then you will need to call AddCost function in MathematicalProgram to add this cost together with all of x, u as the associated variable of this cost.

How to get class labels from TensorFlow prediction

I have a classification model in TF and can get a list of probabilities for the next class (preds). Now I want to select the highest element (argmax) and display its class label.
This may seems silly, but how can I get the class label that matches a position in the predictions tensor?
feed_dict={g['x']: current_char}
preds, state = sess.run([g['preds'],g['final_state']], feed_dict)
prediction = tf.argmax(preds, 1)
preds gives me a vector of predictions for each class. Surely there must be an easy way to just output the most likely class (label)?
Some info about my model:
x = tf.placeholder(tf.int32, [None, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [None, 1], name='labels_placeholder')
batch_size = batch_size = tf.shape(x)[0]
x_one_hot = tf.one_hot(x, num_classes)
rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in
tf.split(x_one_hot, num_steps, 1)]
tmp = tf.stack(rnn_inputs)
print(tmp.get_shape())
tmp2 = tf.transpose(tmp, perm=[1, 0, 2])
print(tmp2.get_shape())
rnn_inputs = tmp2
with tf.variable_scope('softmax'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
rnn_outputs = rnn_outputs[:, num_steps - 1, :]
rnn_outputs = tf.reshape(rnn_outputs, [-1, state_size])
y_reshaped = tf.reshape(y, [-1])
logits = tf.matmul(rnn_outputs, W) + b
predictions = tf.nn.softmax(logits)
A prediction is an array of n types of classes(labels). It represents the model's "confidence" that the image corresponds to each of its classes(labels). You can check which label has the highest confidence value by using:
prediction = np.argmax(preds, 1)
After getting this highest element index using (argmax function) out of other probabilities, you need to place this index into class labels to find the exact class name associated with this index.
class_names[prediction]
Please refer to this link for more understanding.
You can use tf.reduce_max() for this. I would refer you to this answer.
Let me know if it works - will edit if it doesn't.
Mind that there are sometimes several ways to load a dataset. For instance with fashion MNIST the tutorial could lead you to use load_data() and then to create your own structure to interpret a prediction. However you can also load these data by using tensorflow_datasets.load(...) like here after installing tensorflow-datasets which gives you access to some DatasetInfo. So for instance if your prediction is 9 you can tell it's a boot with:
import tensorflow_datasets as tfds
_, ds_info = tfds.load('fashion_mnist', with_info=True)
print(ds_info.features['label'].names[9])
When you use softmax, the labels you train the model on are either numbers 0..n or one-hot encoded values. So if original labels of your data are let's say string names, you must map them to integers first and keep the mapping as a variable (such as 0 -> "apple", 1 -> "orange", 2 -> "pear" ...).
When using integers (with loss='sparse_categorical_crossentropy'), you get predictions as an array of probabilities, you just find the array index with the max value. You can use this predicted index to reverse-map to your label:
predictedIndex = np.argmax(predictions) // 2
predictedLabel = indexToLabelMap[predictedIndex] // "pear"
If you use one-hot encoded labels (with loss='categorical_crossentropy'), the predicted index corresponds with the "hot" index of your label.
Just for reference, I needed this info when I was working with MNIST dataset used in Google's Machine learning crash course. There is also a good classification tutorial in the Tensorflow docs.

Polynomial regression in spark/ or external packages for spark

After investing good amount of searching on net for this topic, I am ending up here if I can get some pointer . please read further
After analyzing Spark 2.0 I concluded polynomial regression is not possible with spark (spark alone), so is there some extension to spark which can be used for polynomial regression?
- Rspark it could be done (but looking for better alternative)
- RFormula in spark does prediction but coefficients are not available (which is my main requirement as I primarily interested in coefficient values)
Polynomial regression is just another case of a linear regression (as in Polynomial regression is linear regression and Polynomial regression). As Spark has a method for linear regression, you can call that method changing the inputs in such a way that the new inputs are the ones suited to polynomial regression. For instance, if you only have one independent variable x, and you want to do quadratic regression, you have to change your independent input matrix for [x x^2].
I would like to add some information to #Mehdi Lamrani’s answer :
If you want to do a polynomial linear regression in SparkML, you may use the class PolynomialExpansion.
For information check the class in the SparkML Doc
or in the Spark API Doc
Here is an implementation example:
Let's assume we have a train and test datasets, stocked in two csv files, with headers containing the neames of the columns (features, label).
Each data set contains three features named f1,f2,f3, each of type Double (this is the X matrix), as well as a label feature (the Y vector) named mylabel.
For this code I used Spark+Scala:
Scala version : 2.12.8
Spark version 2.4.0.
We assume that SparkML library was already downloaded in build.sbt.
First of all, import librairies :
import org.apache.spark.ml.{Pipeline, PipelineModel}
import org.apache.spark.ml.linalg.{Vector, Vectors}
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.mllib.evaluation.RegressionMetrics
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions.udf
import org.apache.spark.{SparkConf, SparkContext}
Create Spark Session and Spark Context :
val ss = org.apache.spark.sql
.SparkSession.builder()
.master("local")
.appName("Read CSV")
.enableHiveSupport()
.getOrCreate()
val conf = new SparkConf().setAppName("test").setMaster("local[*]")
val sc = new SparkContext(conf)
Instantiate the variables you are going to use :
val f_train:String = "path/to/your/train_file.csv"
val f_test:String = "path/to/your/test_file.csv"
val degree:Int = 3 // Set the degree of your choice
val maxIter:Int = 10 // Set the max number of iterations
val lambda:Double = 0.0 // Set your lambda
val alpha:Double = 0.3 // Set the learning rate
First of all, let's create first several udf-s, which will be used for the data reading and pre-processing.
The arguments' types of the udf toFeatures will be Vector followed by the type of the arguments of the features: (Double,Double,Double)
val toFeatures = udf[Vector, Double, Double, Double] {
(a,b,c) => Vectors.dense(a,b,c)
}
val encodeIntToDouble = udf[Double, Int](_.toDouble)
Now let's create a function which extracts data from CSV and creates, new features from the existing ones, using PolynomialExpansion:
def getDataPolynomial(
currentfile:String,
sc:SparkSession,
sco:SparkContext,
degree:Int
):DataFrame =
{
val df_rough:DataFrame = sc.read
.format("csv")
.option("header", "true") //first line in file has headers
.option("mode", "DROPMALFORMED")
.option("inferSchema", value=true)
.load(currentfile)
.toDF("f1", "f2", "f3", "myLabel")
// you may add or not the last line
val df:DataFrame = df_rough
.withColumn("featNormTemp", toFeatures(df_rough("f1"), df_rough("f2"), df_rough("f3")))
.withColumn("label", Tools.encodeIntToDouble(df_rough("myLabel")))
val polyExpansion = new PolynomialExpansion()
.setInputCol("featNormTemp")
.setOutputCol("polyFeatures")
.setDegree(degree)
val polyDF:DataFrame=polyExpansion.transform(df.select("featNormTemp"))
val datafixedWithFeatures:DataFrame = polyDF.withColumn("features", polyDF("polyFeatures"))
val datafixedWithFeaturesLabel = datafixedWithFeatures
.join(df,df("featNormTemp") === datafixedWithFeatures("featNormTemp"))
.select("label", "polyFeatures")
datafixedWithFeaturesLabel
}
Now, run the function both for the train and test datasets, using the chosen degree for the Polynomial expansion.
val X:DataFrame = getDataPolynomial(f_train,ss,sc,degree)
val X_test:DataFrame = getDataPolynomial(f_test,ss,sc,degree)
Run the algorithm in order to get a model of linear regression, using a pipeline :
val assembler = new VectorAssembler()
.setInputCols(Array("polyFeatures"))
.setOutputCol("features2")
val lr = new LinearRegression()
.setMaxIter(maxIter)
.setRegParam(lambda)
.setElasticNetParam(alpha)
.setFeaturesCol("features2")
.setLabelCol("label")
// Fit the model:
val pipeline:Pipeline = new Pipeline().setStages(Array(assembler,lr))
val lrModel:PipelineModel = pipeline.fit(X)
// Get prediction on the test set :
val result:DataFrame = lrModel.transform(X_test)
Finally, evaluate the result using mean squared error measure :
def leastSquaresError(result:DataFrame):Double = {
val rm:RegressionMetrics = new RegressionMetrics(
result
.select("label","prediction")
.rdd
.map(x => (x(0).asInstanceOf[Double], x(1).asInstanceOf[Double])))
Math.sqrt(rm.meanSquaredError)
}
val error:Double = leastSquaresError(result)
println("Error : "+error)
I hope this might be useful !

Calculating vector distance for classification with mixed features

I'm doing a project comparing the effectiveness of various classification algorithms, but I'm stuck on a frustrating point. The data may be found here: http://archive.ics.uci.edu/ml/datasets/Adult The classification problem is whether or not a person makes over 50k a year based on their census data.
Two example entries are as follows:
45, Private, 98092, HS-grad, 9, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 60, United-States, <=50K
50, Self-emp-not-inc, 386397, Bachelors, 13, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 60, United-States, <=50K
I'm familiar with using Euclidean distance to calculate the difference between vectors, but I'm not sure how to work with a mix of continuous and discrete attributes. Are there any effective methods for representing the difference between two vectors in a meaningful way? I'm having a hard time wrapping my head around how large values like the third attribute (a weight calculated by the people who extracted the data set based on factors, so that similar weights should have similar attributes) and differences between it can preserve meaning from discrete features like male or female, which is only a Euclidean distance of 1 if I understand the method correctly. I'm sure some categories could be removed, but I don't want to remove something that factors into classification significantly. I'm tackling k-NN first once I get this figured out, then a Bayesian classifier, and finally a decision tree model like C4.5 or ID3 if I have the time.
Sure, you can extend Euclidean distance in any number of ways. The simplest extension would be the following rule:
distance = 0 in that coordinate if there's a match, 1 otherwise
The challenge will be making the concept of distance "relevant" for the k-NN follow up. In some cases (e.g. education), I think it will be best to map education (discrete variable) into a continuous variable, such as years of education. So you'll need to write a function which maps e.g. "HS-grad" to 12, "Bachelors" to 16, something like that.
Beyond that, using k-NN directly isn't going to work because the idea of "distance" among multiple dis-similar dimensions isn't well defined. I think you'll be better off throwing some of these dimensions away or weighting them differently. I don't know what the third number in your dataset (e.g. 98092) means, but if you use naive Euclidean distance this would be extremely overweighted compared to other dimensions such as age.
I'm not a machine learning expert, but I would personally be tempted to start k-NN on a reduced dimensionality dataset where you just pick some broad demographics (e.g. age, education, marital status) and ignore the trickier/"noisier" categories.
You need to code your categorical variables as 1-of-n binary variables (n choices for the variable, and of those variables one and only one is active). Then standardise your features---for each feature, subtract its mean and divide by standard deviation. Or normalise into the range 0-1. It's not perfect, but this will at least make dimensions comparable.
Create individual Maps for each data points and use the map to convert to a double value.
def createMap(data: RDD[String]) : Map[String,Double] = {
var mapData:Map[String,Double] = Map()
var counter = 0.0
data.collect().foreach{ item =>
counter = counter +1
mapData += (item -> counter)
}
mapData
}
def getLablelValue(input: String): Int = input match {
case "<=50K" => 0
case ">50K" => 1
}
val census = sc.textFile("/user/cloudera/census_data.txt")
val orgTypeRdd = census.map(line => line.split(", ")(1)).distinct
val gradeTypeRdd = census.map(line => line.split(", ")(3)).distinct
val marStatusRdd = census.map(line => line.split(", ")(5)).distinct
val jobTypeRdd = census.map(line => line.split(", ")(6)).distinct
val familyStatusRdd = census.map(line => line.split(", ")(7)).distinct
val raceTypeRdd = census.map(line => line.split(", ")(8)).distinct
val genderTypeRdd = census.map(line => line.split(", ")(9)).distinct
val countryRdd = census.map(line => line.split(", ")(13)).distinct
val salaryRange = census.map(line => line.split(", ")(14)).distinct
val orgTypeMap = createMap(orgTypeRdd)
val gradeTypeMap = createMap(gradeTypeRdd)
val marStatusMap = createMap(marStatusRdd)
val jobTypeMap = createMap(jobTypeRdd)
val familyStatusMap = createMap(familyStatusRdd)
val raceTypeMap = createMap(raceTypeRdd)
val genderTypeMap = createMap(genderTypeRdd)
val countryMap = createMap(countryRdd)
val salaryRangeMap = createMap(salaryRange)
val featureVector = census.map{line =>
val fields = line.split(", ")
LabeledPoint(getLablelValue(fields(14).toString) , Vectors.dense(fields(0).toDouble, orgTypeMap(fields(1).toString) , fields(2).toDouble , gradeTypeMap(fields(3).toString) , fields(4).toDouble , marStatusMap(fields(5).toString), jobTypeMap(fields(6).toString), familyStatusMap(fields(7).toString),raceTypeMap(fields(8).toString),genderTypeMap (fields(9).toString), fields(10).toDouble , fields(11).toDouble , fields(12).toDouble,countryMap(fields(13).toString) , salaryRangeMap(fields(14).toString)))
}

WEKA classification likelihood of the classes

I would like to know if there is a way in WEKA to output a number of 'best-guesses' for a classification.
My scenario is: I classify the data with cross-validation for instance, then on weka's output I get something like: these are the 3 best-guesses for the classification of this instance. What I want is like, even if an instance isn't correctly classified i get an output of the 3 or 5 best-guesses for that instance.
Example:
Classes: A,B,C,D,E
Instances: 1...10
And output would be:
instance 1 is 90% likely to be class A, 75% likely to be class B, 60% like to be class C..
Thanks.
Weka's API has a method called Classifier.distributionForInstance() tha can be used to get the classification prediction distribution. You can then sort the distribution by decreasing probability to get your top-N predictions.
Below is a function that prints out: (1) the test instance's ground truth label; (2) the predicted label from classifyInstance(); and (3) the prediction distribution from distributionForInstance(). I have used this with J48, but it should work with other classifiers.
The inputs parameters are the serialized model file (which you can create during the model training phase and applying the -d option) and the test file in ARFF format.
public void test(String modelFileSerialized, String testFileARFF)
throws Exception
{
// Deserialize the classifier.
Classifier classifier =
(Classifier) weka.core.SerializationHelper.read(
modelFileSerialized);
// Load the test instances.
Instances testInstances = DataSource.read(testFileARFF);
// Mark the last attribute in each instance as the true class.
testInstances.setClassIndex(testInstances.numAttributes()-1);
int numTestInstances = testInstances.numInstances();
System.out.printf("There are %d test instances\n", numTestInstances);
// Loop over each test instance.
for (int i = 0; i < numTestInstances; i++)
{
// Get the true class label from the instance's own classIndex.
String trueClassLabel =
testInstances.instance(i).toString(testInstances.classIndex());
// Make the prediction here.
double predictionIndex =
classifier.classifyInstance(testInstances.instance(i));
// Get the predicted class label from the predictionIndex.
String predictedClassLabel =
testInstances.classAttribute().value((int) predictionIndex);
// Get the prediction probability distribution.
double[] predictionDistribution =
classifier.distributionForInstance(testInstances.instance(i));
// Print out the true label, predicted label, and the distribution.
System.out.printf("%5d: true=%-10s, predicted=%-10s, distribution=",
i, trueClassLabel, predictedClassLabel);
// Loop over all the prediction labels in the distribution.
for (int predictionDistributionIndex = 0;
predictionDistributionIndex < predictionDistribution.length;
predictionDistributionIndex++)
{
// Get this distribution index's class label.
String predictionDistributionIndexAsClassLabel =
testInstances.classAttribute().value(
predictionDistributionIndex);
// Get the probability.
double predictionProbability =
predictionDistribution[predictionDistributionIndex];
System.out.printf("[%10s : %6.3f]",
predictionDistributionIndexAsClassLabel,
predictionProbability );
}
o.printf("\n");
}
}
I don't know if you can do it natively, but you can just get the probabilities for each class, sorted them and take the first three.
The function you want is distributionForInstance(Instance instance) which returns a double[] giving the probability for each class.
Not in general. The information you want is not available with all classifiers -- in most cases (for example for decision trees), the decision is clear (albeit potentially incorrect) without a confidence value. Your task requires classifiers that can handle uncertainty (such as the naive Bayes classifier).
Technically the easiest thing to do is probably to train the model and then classify an individual instance, for which Weka should give you the desired output. In general you can of course also do it for sets of instances, but I don't think that Weka provides this out of the box. You would probably have to customise the code or use it through an API (for example in R).
when you calculate a probability for the instance, how exactly do you do this?
I have posted my PART rules and data for the new instance here but as far as calculation manually I am not so sure how to do this! Thanks
EDIT: now calculated:
private float[] getProbDist(String split){
// takes in something such as (52/2) meaning 52 instances correctly classified and 2 incorrectly classified.
if(prob_dis.length > 2)
return null;
if(prob_dis.length == 1){
String temp = prob_dis[0];
prob_dis = new String[2];
prob_dis[0] = "1";
prob_dis[1] = temp;
}
float p1 = new Float(prob_dis[0]);
float p2 = new Float(prob_dis[1]);
// assumes two tags
float[] tag_prob = new float[2];
tag_prob[1] = 1 - tag_prob[1];
tag_prob[0] = (float)p2/p1;
// returns double[] as being the probabilities
return tag_prob;
}

Resources