Getting filename when using TextDirectoryLoader - weka - machine-learning

I am using the TextDirectoryLoader in weka which takes in as an input a directory which has the training data as files arranged in folders and each folder indicates a class label. I pass the test_example directory name as an argument. The training part is fine.
Example:
+- text_example
|
+- class1
| |
| + file1.txt
| |
| + file2.txt
| |
| ...
|
+- class2
| |
| + another_file1.txt
| |
| + another_file2.txt
| |
| ...
The above illustration borrowed from here
For testing and predicting labels, I create a similar structure.
+- predictor_unknowns
|
+- unknown
| |
| + unknownfile1.txt
| |
| + unknownfile2.txt
| |
| ...
I again pass the director predictor_unknowns as an arguement to TextDirectoryLoader and I can see the predicting is done fine but I am not sure how to print the file name for which the preidiction is happening. I need to print unknownfile1.txt,unknownfile2.txt etc for which the prediction is happening.
Hope the question is clear enough.

In weka, those text files and classes become an Instance and the filenames are not saved in Instance class.
Instead, you can get the text content of that file which got classified.
double pred = 0d;
Instance current = getInstance();
pred = classifier.classifyInstance(current);
System.out.println("\nText: "+current.attribute(0)); // Change index according to your dataset
System.out.println("Class: "+tempInstances.classAttribute().value((int) pred));

In the interest of benefiting others who may have this question, the documentation for the TextDirectoryLoader explains that you can save the filename as an extra attribute.
On the command line, just add the -F flag.
In Java code, you can use the following line (tdl is an instance of TextDirectoryLoader):
tdl.setOutputFilename(true);
As long as you do not run the dataset through any filters, each instance will have a string attribute called "filename". If you are planning to run the dataset through filters, it may be useful to use a FilteredClassifier so that you can still access the filename.

Related

Select query in Ruby on Rails

I am new to ruby on rails but I am not finding the meaning of this line of code. I saw in the documentation of rails that select will build an array of objects from the database for the scope, converting them into an array and iterating through them using Array#select. Anyway I can’t understand the result of this line of code and on what it consists.
model.legal_storages.select{|storage| storage.send(document_type)==true}.last
model.legal_storages.select { |storage| storage.send(document_type) == true }.last - From the result of the last operation, select only the last element.
| | |
| | --------------------- For each element in model.legal_stores invoke
| | the method that is held by the variable document_type
| | and check if it's equal to true.
| |
| --------- Over the result of the last method,
| call select to filter those elements where
| condition in the block evaluates to true.
|
------------------- Invoke the method legal_stores in model.

Splitting examples in Given and Then for SpecFlow Scenario Outline

I am writing a specflow scenario with multiple input and output parameters (about 4-5 each). When using scenario outline, I need to write a wide table giving both input and output columns in the same row. Is there any way where I can specify the examples separately for the step definitions? This is for improved readability.
Current state
Given - State of the data
When I trigger action with parameters <input1> and <input2> and ...
Then my output should contain <output1> and <output2> ...
Examples:
| input1 | input2 |... | output1 | output2 |...
Can I do this?
Given - State of the data
When I trigger action with parameters <input1> and <input2> and ...
Examples of input
Then my output should contain <output1> and <output2> ...
Examples of output
No, unfortunately that (or anything similar) is not possible.
You could make your inputs and outputs more abstract and possibly merge a few columns. Example: instead of Country | PostalCode | City | Street | House | Firstname | Lastname | etc. you should have | Address | Job title | with values like "EU", "US, missing postal code", "HQ" for the address.
You can't have multiple Example tables for scenario outline but you can pass in data tables for regular scenarios.
The data table will be accessible only to the step that uses it, however you could save it in Scenario Context for subsequent steps.
Not sure if this will work for you if your scenario is complex and spans multiple lines but I thought I'd mention it.
Scenario: Checking outputs for inputs
Given - State of the data
When I trigger action with the following parameters
input1 | input2 | input3 |
data | data | data |
Then my output should contain the following outputs
output1 | output2 | output3 |
data | data | data |

How do I use dynamic arguments in my SpecFlow scenario background?

I have a feature that logs into a trading system and keys a number of trades. Theres a lot of reusable steps at the beginning of each trade (initial trade set up) But each trade has different arguments.
Here is an example
Scenario: Trade 1
Given I have selected my test data: "20003"
And I have connected to VMS with the following details:
| Field | Value |
| Username | user |
| Password | password |
| Session | myServer |
When I run the DCL command to set my privileges to full
Then I expect to see the following:
| Pass Criteria | Timeout |
| Privileges Set | 00:00:30 |
When I ICE to the test account: "System Test"
Then I expect to be ICED see the following:
| Pass Criteria | Timeout |
| "ICED to System Test" | "00:00:10" |
When I run a dcl to delete the company: "Test_Company"
Then I expect to see a confirmation that company: "Test_Company" has been deleted or doesnt exist
So within those steps the 2 things that could change is the "Given" argument so the test data ID and also the Test company at the end.
What I wanted was some way to run a background step so that its being able to know what parameters to enter. So if it was Trade 1 for example it would enter 20003, if it was Trade 2 enter 20004 etc.
Can I do this? I was thinking using the "Example" table that Scenario Outline uses. Or is there a better way to do this? I dont want these repeatable steps in all of my scenarios as it takes up lots of room and doesnt look too readable.
So I did some searching and couldn't find a solution that didn't require a lot of coding so I made this up:
this is what the background looks like
Background:
Given I have selected my test data:
| Scenario | ID |
| DirectCredit_GBP | 20003 |
| Cheque_GBP | 20004 |
| ForeignCheque_GBP | 20005 |
And in order to find which row it should use the method behind it uses ScenarioContext. Here is the method:
[Given(#"I have selected my test data:")]
[When(#"I have selected my test data:")]
public static void setTestDataID(Table data)
{
string scenario = ScenarioContext.Current.ScenarioInfo.Title;
string testDataId = data.ReadTable("Scenario", scenario, "ID"));
TestDriver.LoadTestData(testDataId);
}
What the method does is search the table for the scenario name (using an extension method I wrote) and get the ID, once its got the ID it passes it into my TestDriver method.
It seems to work fine and keeps the test readable.

Is it possible to parameterise the NUnit test case display name when using ``Ticked method names``?

I am testing out F# and using NUnit as my test library; I have discovered the use of double-back ticks to allow arbitrary method naming to make my method names even more human readable.
I was wondering, whether rightly or wrongly, if it is possible to parameterise the method names when using NUnit's TestCaseAttribute to change the method name, for example:
[<TestCase("1", 1)>]
[<TestCase("2", 2)>]
let ``Should return #expected when "#input" is supplied`` input expected =
...
This might not be exactly what you need, but if you want to go beyond unit testing, then TickSpec (a BDD framework using F#) has a nice feature where it lets you write parameterized scenarios based on back-tick methods that contain regular expressions as place holders.
For example, in Phil Trelford's blog post, he uses this to define tic-tac-toe scenario:
Scenario: Winning positions
Given a board layout:
| 1 | 2 | 3 |
| O | O | X |
| O | | |
| X | | X |
When a player marks X at <row> <col>
Then X wins
Examples:
| row | col |
| middle | right |
| middle | middle |
| bottom | middle |
The method that implements the When clause of the scenario is defined in F# using something like this:
let [<When>] ``a player marks (X|O) at (top|middle|bottom) (left|middle|right)``
(mark:string,row:Row,col:Col) =
let y = int row
let x = int col
Debug.Assert(System.String.IsNullOrEmpty(layout.[y].[x]))
layout.[y].[x] <- mark
This is a neat thing, but it might be an overkill if you just want to write a simple parameterized unit test - BDD is useful if you want to produce human readable specifications of different scenarios (and there are actually other people reading them!)
This is not possible.
The basic issue is that for every input and expected you need to create a unique function. You would then need to pick the correct function to call (or your stacktrace wouldn't make sense). As a result this is not possible.
Having said that if you hacked around with something like eval (which must exist inside fsi), it might be possible to create something like this, but it would be very slow.

Creating Rowtests with SpecFlow

I am trying to create row tests using SpecFlow and the Microsoft built-in Test Framework, something along these lines:
Scenario Outline: Test Calculator
Given I have entered <x> into the calculator
And I have entered <y> into the calculator
When I press add
Then the result should be <result> on the screen
Examples:
| x | y | result|
| 1 | 2 | 3|
| 2 | 2 | 4|
The problem I am facing is that given any step in the Scenario Outline a separate step method is auto-generated for each value from the Examples table. I would like to be able to implement for each step a generic method receiving input values as parameters but it just does not seem to work.
In the end it looks like it works as expected, what I was missing were quotes around input parameters placeholders:
Scenario Outline: Test Calculator
Given I have entered "<x>" into the calculator
And I have entered "<y>" into the calculator
When I press add
Then the result should be "<result>" on the screen
Examples:
| x | y | result|
| 1 | 2 | 3|
| 2 | 2 | 4|
I had this same problem in VS 2012. I think it may be a bug with SpecFlow, because when I change the Scenario Outline to only be a Scenario, it generates everything correctly. All the documentation says you should not have to surround the placeholders in quotes.
In short, my solution is to change it to a Scenario to generate the steps. But don't forget, you have to change it back to a Scenario Outline to compile. This is what is working for me.

Resources