Converting a Weka classifier into a score - machine-learning

I'm trying to convert my classifier result from classifying instances as 0 or 1, to instead give a score (confidence measure?), say between 0 and 10,
I am using a RIDOR classifier but could also use ClassificationViaRegression, RandomForest or AttributeSelectedClassifier just as easily, although they don't classify quite as well.
I have output everything I can to the terminal (all the options checked), but I can't find a confidence measure anywhere in the predictions. In addition I understand none of these have the option to output the source code? In which case i'll have to code the classifiers manually.
Here is an example of the rules generated:
class = 2 (40536.0/20268.0)
Except (fog <= 14.115114) and (polySyllabicWords/Sentence <= 1.973684) and (polySyllabicWords/Sentence <= 1.245) and (Characters/Word > 4.331715) => class = 1 (2309.0/5.0) [1137.0/4.0]
Except (fog <= 14.115598) and (polySyllabicWords/Sentence <= 1.973684) and (polySyllabicWords/Sentence > 1.514706) => class = 1 (2281.0/0.0) [1112.0/0.0]
Except (fog <= 14.136126) and (Words/Sentence > 19.651515) and (polySyllableCount <= 10.5) and (polySyllabicWords/Sentence > 2.416667) and (Syllables/Sentence <= 34.875) => class = 1 (601.0/0.0) [303.0/6.0]
Except (fog <= 14.140863) and (polySyllabicWords/Sentence <= 1.944444) and (polySyllableCount <= 4.5) and (polySyllabicWords/Sentence <= 1.416667) and (wordCount > 29.5) and (Characters/Word <= 4.83156) => class = 1 (333.0/0.0) [152.0/0.0]
Except (fog <= 14.142217) and (polySyllabicWords/Sentence <= 1.944444) and (polySyllableCount <= 4.5) and (polySyllabicWords/Sentence <= 1.416667) and (numOfChars > 30.5) and (Syllables/Word <= 1.474937) => class = 1 (322.0/0.0) [174.0/4.0]
Except (fog <= 14.140863) and (polySyllabicWords/Sentence <= 1.75) and (polySyllableCount <= 4.5) => class = 1 (580.0/28.0) [298.0/21.0]
Except (fog <= 14.141508) and (Syllables/Sentence > 25.585714) and (Words/Sentence > 19.683333) and (sentenceCount <= 4.5) and (polySyllabicWords/Sentence <= 2.291667) and (fog > 12.269468) => class = 1 (434.0/0.0) [202.0/0.0]
Except (fog <= 14.140863) and (Syllables/Sentence > 25.866071) and (polySyllableCount <= 16.5) and (fog > 12.793102) and (polySyllabicWords/Sentence <= 2.9) and (wordCount <= 59.5) and (Words/Sentence > 16.166667) and (Words/Sentence <= 24.75) => class = 1 (291.0/0.0) [166.0/0.0]
Except (fog <= 14.140863) and (Syllables/Sentence > 25.585714) and (Words/Sentence > 19.630682) and (polySyllabicWords/Sentence > 2.656863) and (polySyllableCount <= 16.5) and (fog > 13.560337) and (Words/Sentence <= 21.55) and (numOfChars <= 523) => class = 1 (209.0/0.0) [93.0/2.0]
Except (fog <= 14.147578) and (Syllables/Word <= 1.649029) and (polySyllabicWords/Sentence <= 1.75) and (polySyllabicWords/Sentence > 1.303846) and (polySyllabicWords/Sentence <= 1.422619) and (fog > 9.327132) => class = 1 (183.0/0.0) [64.0/0.0]......
I am also unsure what the first line means (40536/20368) - does that just mean classify it as 2, unless one of the following rules apply?
Any help is much appreciated!

Generally, deriving confidence from classifiers is not regarded as an easy task, especially if you'd like it calibrated (e.g. presented as chances of the classification being correct). However, there are several relatively easy ways of getting rough estimations.
With tree and rule based classifiers, the numbers in parentheses represent the number of correct/incorrect samples included in the bucket. So, for instance, a bucket with (20,2) would mean there were 20 cases where this rule was correct, and 2 where it was incorrect, based on the train data. You could use this ratio as a rough measure of confidence.
When using regression, you can get WEKA to output the actual numeric result of the classifier (rather than just the class), and base a measure of confidence on it.
More generally, following the documentation, you can use the -p option of the commend line (see here). However, I'm not certain how these numbers are calculated.

Related

dafny matrix expressions and functions

I'm trying to define a matrix transpose method and functions in Dafny. I'm having difficulty defining the function version.
/** verifies **/
method transpose(matrix: array2<real>) returns (result: array2<real>)
ensures result.Length0 == matrix.Length1 && result.Length1 == matrix.Length0
ensures forall i, j :: 0 <= i < matrix.Length1 && 0 <= j < matrix.Length0 ==> result[i,j] == matrix[j,i]
{
result := new real[matrix.Length1, matrix.Length0]((i,j) reads matrix => if 0 <= i < matrix.Length1 && 0 <= j < matrix.Length0 then matrix[j,i] else 0.0);
assert result.Length0 == matrix.Length1;
assert result.Length1 == matrix.Length0;
}
/** says it is an invalid LogicalExpresion**/
function ftranspose(matrix: array2<real>): array2<real>
reads matrix
ensures ftranspose(matrix).Length0 == matrix.Length1 && ftranspose(matrix).Length1 == matrix.Length0
ensures forall i, j :: 0 <= i < matrix.Length1 && 0 <= j < matrix.Length0 ==> ftranspose(matrix)[i,j] == matrix[j,i]
{
new real[matrix.Length1, matrix.Length0]((i,j) reads matrix => if 0 <= i < matrix.Length1 && 0 <= j < matrix.Length0 then matrix[j,i] else 0.0)
}
I'm not quite sure why it says it is an invalid logical expression since in the method I am able to assign it to a variable, which makes me assume that it is an expression.
I can see here in the docs that
Array allocation is permitted in ghost contexts. If any expression used to specify a dimension or initialization value is ghost, then the new allocation can only be used in ghost contexts. Because the elements of an array are non-ghost, an array allocated in a ghost context in effect cannot be changed after initialization.
So it seems like I should be able to define a new array in a function. What is the correct syntax here?
Functions (even ghost functions) are not allowed to allocate memory or call methods, so calls to new cannot appear in function bodies.
This is because functions must be deterministic (return the same thing when called with the same arguments). As written, your function would return a different (fresh) object every time (reference types like arrays have reference equality, which means that they are the same if they live at the same address, not just if they have the same contents).
The passage you quoted is relevant for ghost methods, but does not apply to functions.
So the answer is 1. Don't use array which is heap based as Clément said. 2. Use datatypes. The following verifies...
datatype Matrix = Matrice(vals: seq<seq<real>>, rows: nat, columns: nat)
predicate isMatrix(mat: Matrix) {
mat.rows >= 1 && mat.columns >= 1 && |mat.vals| == mat.rows && forall i :: 0 <= i < mat.rows ==> |mat.vals[i]| == mat.columns
}
function method seqTranspose(mat: Matrix): Matrix
requires isMatrix(mat)
ensures isMatrix(seqTranspose(mat))
ensures seqTranspose(mat).columns == mat.rows
ensures seqTranspose(mat).rows == mat.columns
// ensures seqTranpose(matrix).Length0 == matrix.Length1 && ftranspose(matrix).Length1 == matrix.Length0
ensures forall i, j :: 0 <= i < mat.columns && 0 <= j < mat.rows ==> seqTranspose(mat).vals[i][j] == mat.vals[j][i]
{
Matrice(seq(mat.columns, i requires 0 <= i < mat.columns => seq(mat.rows, j requires 0 <= j < mat.rows => mat.vals[j][i])), mat.columns, mat.rows)
}
lemma matTranspose(mat: Matrix)
requires isMatrix(mat)
ensures seqTranspose(seqTranspose(mat)) == mat
{
assert forall i :: 0 <= i < |mat.vals| ==> mat.vals[i] == seqTranspose(seqTranspose(mat)).vals[i];
}

In Tableau dashboard I have a metric which i should display in kilos, Millions and Billions automatically

I have a Measure in Tableau which has to be display dynamically in Kilos, Millions , Billions and also formatted as 200k, 2726M.
You can try with this formula which you can expand on your needs:
str(
if SUM([Sales]) > 1000000000 THEN ROUND(SUM([Sales])/1000000000,1)
elseif SUM([Sales]) > 1000000 THEN ROUND(SUM([Sales])/1000000,1)
elseif SUM([Sales]) > 1000 THEN ROUND(SUM([Sales])/1000,1)
else SUM([Sales])
end )
+
if SUM([Sales]) > 1000000000 THEN 'B'
elseif SUM([Sales]) > 1000000 THEN 'M'
elseif SUM([Sales]) > 1000 THEN 'K'
else ''
end
See screenshot as quick example on Superstore:

How to compare negative numbers (2's complement) in verilog?

I have the following comparison statement in verilog, which works fine (positive comparison)
if((count_cc >= 13'b0000000011110)&&(count_cc <= 13'b0000000111100)) //30,60
begin
level=4'b0010; //level 2
end
However, when I use 2's complement, it does not work,
if((count_cc >= 13'b1111111100100)&&(count_cc <= 13'b1111111110110)) //-10 , -28
begin
level=4'b0101; //level 5
end
Any guidance would be helpful.
Assuming count_cc is already declared signed, use
if((count_cc >= 13'sb1111111100100)&&(count_cc <= 13'sb1111111110110)) //-10 , -28
begin
level=4'b0101; //level 5
end
of course, this would also work
if((count_cc >= -10)&&(count_cc <= -28)

A* only works in certain cases

My a* path finding algorithm only works for certain cases but I don't understand why. Every node in my grid is walkable so in theory every path should work. I believe the error is in this line:
PathFindingNode *neighbor = NULL;
if ((y > 0 && x > 0) && (y < gridY - 1 && x < gridX - 1))
neighbor = [[grid objectAtIndex:x + dx] objectAtIndex:y +dy];
In function -(void)addNeighbors:, the line
if ((y > 0 && x > 0) && (y < gridY - 1 && x < gridX - 1))
neighbor = [[grid objectAtIndex:x + dx] objectAtIndex:y +dy];
has bug because if curNode is on boundary, it does not add neighbors to the queue. So that the algorithm will never reach endNode in the four corners (i.e. [0,0], [gridX-1,0], [0,gridY-1], [gridX-1,gridY-1]).

all values same sign validation

User should insert all the values either positive or negative.
How may i set same sign validation ?
Right i have written this on before_save ..
unless (self.alt_1 >= 0 && self.alt_2 >=0 && self.alt_3 >= 0 &&
self.alt_4 >= 0 && self.alt_5 >= 0 && self.alt_6 >= 0) ||
(self.alt_1 <= 0 && self.alt_2 <=0 && self.alt_3 <= 0 &&
self.alt_4 <= 0 && self.alt_5 <= 0 && self.alt_6 <= 0)
self.errors.add_to_base(_("All values sign should be same."))
end
first_sign = self.alt_1 <=> 0
(2..6).each do |n|
unless (self.send("alt_#{n}") <=> 0) == first_sign
errors.add_to_base(_("All values' signs should be same."))
break
end
end
With this method we first get the sign of alt_1, and then see if the signs of the rest of the elements (alt_2 through alt_6) match. As soon as we find one that doesn't match we add the validation error and stop. It will run a maximum of 6 iterations and a minimum of 2.
Another more clever, but less efficient method, is to use the handy method Enumerable#all?, which returns true if the block passed to it returns true for all elements:
range = 1..6
errors.add_to_base(_("All values' signs should be same.")) unless
range.all? {|n| self.send("alt_#{n}") >= 0 } ||
range.all? {|n| self.send("alt_#{n}") <= 0 }
Here we first check if all of the elements are greater than 0 and then if all of the elements are less than 0. This method iterates a maximum of 12 times and a minimum of 6.
Here's a slightly different approach for you:
irb(main):020:0> def all_same_sign?(ary)
irb(main):021:1> ary.map { |x| x <=> 0 }.each_cons(2).all? { |x| x[0] == x[1] }
irb(main):022:1> end
=> nil
irb(main):023:0> all_same_sign? [1,2,3]
=> true
irb(main):024:0> all_same_sign? [1,2,0]
=> false
irb(main):025:0> all_same_sign? [-1, -5]
=> true
We use the spaceship operator to obtain the sign of each number, and we make sure that each element has the same sign as the element following it. You could also rewrite it to be more lazy by doing
ary.each_cons(2).all? { |x| (x[0] <=> 0) == (x[1] <=> 0) }
but that's less readable in my opinion.
unless
[:<=, :>=].any? do |check|
# Check either <= or >= for all values
[self.alt1, self.alt2, self.alt3, self.alt4, self.alt5, self.alt6].all? do |v|
v.send(check, 0)
end
end
self.errors.add_to_base(_("All values sign should be same."))
end

Resources