Why WEKA considers data as attributes?

Why WEKA considers data as attributes? - machine-learning

I am trying to predict a category of given string. Below is the sample of arff file i have(REDACTED)
#ATTRIBUTE Text string
#ATTRIBUTE class {type1,type2,type3}
#DATA
"Data for type1" type1
"Data for type2" type2
"Data for type3" type3
"Data for type3" type3
"Data for type1" type1
"Data for type2" type2
I believe the attributes in the above set are Text and Class. However the program recognises each and every work in the Data as attribute. Is this expected?
Enumeration<Attribute> attributeEnumeration = dataset.enumerateAttributes();
while(attributeEnumeration.hasMoreElements()){
Attribute attribute = attributeEnumeration.nextElement();
LOGGER.info("attr name"+attribute.name());
}
I am following the weka docs along with
https://www.codingame.com/playgrounds/6734/machine-learning-with-java---part-5-naive-bayes
https://github.com/nsadawi/WEKA-API/tree/master/src
Thanks,
Jai

Related

machine learning - language prediction using there full names

My dataset contains three columns those are [first_name, Surname and language main ].
for suppose if there is a name called ' prathap singh' who speaks panjabi . and 'ajith komuravelly' - Telugu.
if there is a name called 'Prathap Komuravelly' - the person should be speaking Telugu as surname here contains more weight right ?
How can we go further for this implementation...
Do we need to give the weights for seperate columns or what should i do in this step ?

How to use a variabe value to name a file using SPSS syntax?

I have a data set with 100+ variables for each of many people.
I have two variables for the person's name: lastname and firstname.
How can I create an output file using the names (i.e., the values of the variables lastname and firstname)?
I would like to do a split file by lastname firstname and export the data for each person into a text file that uses the person's name as the file name.
Below is what my spss command file is doing now. How can I get the command file to spit out separate files for each person?
SORT CASES BY lastname firstname .
SPLIT FILE BY lastname firstname .
PRINT / " ".
PRINT /
"===================================================================".
PRINT / "Report of Selected Responses from the".
PRINT / "Survey Form Document".
Print /"Responses for Candidate: " .
Print / firstname.
Print / lastname.
PRINT /
"===================================================================".
DO IF (Q1.5month EQ "" OR sysmis(q1.5day) OR sysmis(q1.5year) ).
Print
/ "1.5. Some or all of date of birth left blank".
END IF.
[More such print statements.]
EXECUTE.
SPLIT FILE OFF.

First create an empry string variable and populate it:
String firstname_lastname (A100).
Compute firstname_lastname=concat(rtrim(firstname),"_",rtrim(lastname)).
EXECUTE.
Then go to Data menu, Split into files, select your new firstname_lastname variable as the split variable, and.under options select "value labels". Pick a folder and you are done. Maybe click paste, to have everything in syntax 😉

How to transform to Entity Attribute Value (EAV) using Spoon Normalise

I am trying to use Spoon (Pentaho Data Integration) to change data that is in typical row format to Entity Attribute Value format.
My source data is as follows:
My Normaliser is setup as follows:
And here are the results:
Why is the value for the CONDITION_START_DATE and CONDITION_STOP_DATE in the string_value column instead of the date_value column?
According to this documentation
Fieldname: Name of the fields to normalize
Type: Give a string to classify the field.
New field: You can give one or more fields where the new value should transferred to.

Please check Normalizing multiple rows in a single step section in http://wiki.pentaho.com/display/EAI/Row+Normaliser. Accordind to this, you should have a group of fields with the same Type (pr_sl -> Product1, pr1_nr -> Product1), only in this case you can get multiple fields in output (pr_sl -> Product Sales, pr1_nr -> Product Number).
In your case you can convert dates to strings and then use row normalizer with single new field and then use formula for example:
And then convert date_value to date.

Updating the data-set when classifing new nominal instances

I'm using J48 to classify instances composed of both numeric and nominal values.
My problem is that I don't know which nominal-value I'll come across during my program.
Therefor I need to update my nominal-attribute's data of the model "on the fly".
For instance, say I have only 2 attribute, occupation and age and the run is as followed:
OccuptaionAttribute = {}.
input: [Piano teacher, 22].
OccuptaionAttribute = {Piano teacher}.
input: [school teacher, 30]
OccuptaionAttribute = {Piano teacher, school teacher}.
input: [Piano teacher, 40]
OccuptaionAttribute = {Piano teacher, school teacher}.
etc.
Now I've try to do so manually by copying the previous attributes, adding the new attribute and then updating the model's data.
That works fine when training the model.
But!
when I want to classify a new instance, say [SW engineer, 52], OccuptaionAttribute was updated:
OccuptaionAttribute = {Piano teacher, school teacher, SW engineer}, but the tree itself never "met" "SW engineer" before so the classification cannot be fulfilled and an Exception is thrown.
Can you direct how to handle the above situation?
Does Weka has any mechanism supporting the above issue?
Thanks!

When training add a placeholder data to your nominal-attributes like __other__.
Before trying to classify an instance first check whether the value of nominal attribute is seen before; if its not use the placeholder value:
Attribute attribute = instances.attribute("OccuptaionAttribute");
String s = "SW engineer";
int index = attribute.indexOfValue(s);
if (index == -1) {
index = attribute.indexOfValue("__other__");
}
When you have enough data train again with the new values.

How can I Alias Data in a WebGrid

I have an existing database that I can't change. I have a column of type int that has various numbers that mean something.
This something would be a string. For example, 1="dog", 2="cat", 3="bird". There are a dozen or so integers to deal with.
I'm using ASP.NET MVC 3 with EF 4.1 and have a WebGrid binding to the Model. Is there a way to alias the data for these integers listed in the WebGrid to display the string value that mean something to the user?
Any help on this would be greatly appreciated!

Add an enum to your Model:
public enum foo
{
dog = 1, cat, bird //etc
}
Hopefully, you are using ViewModels. If you are, add a property for the enum:
public foo thing {get;set;}
And set the value of thing based on the integer value you get from the database:
Model m = new Model{number = 3};
m.thing = (foo) m.number;
Or you could create a helper and use that within the format parameter to set a value based on the integer, or you could use JavaScript/jQuery to alter the values from ints to strings once they have been rendered to the browser.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Why WEKA considers data as attributes? - machine-learning

Related

machine learning - language prediction using there full names

How to use a variabe value to name a file using SPSS syntax?

How to transform to Entity Attribute Value (EAV) using Spoon Normalise

Updating the data-set when classifing new nominal instances

How can I Alias Data in a WebGrid

Categories

Resources