Recently i am working on machine learning and build some Models for classification problem with the help of some tutorials. Though i solved my problem successfully but cant get the use and inference of using "NumerictoNominal" method please explain me.
I have tried to learn from the available text but it is very hard core i am seeking for simple explanation.
thanks and regards
I search a lot and finally got a simple example "A set of data is said to be nominal if the values / observations belonging to it can be assigned a code in the form of a number where the numbers are simply labels" for example PIN CODE of a City. Although we use Numeric value to build codes and also u can apply simple Algebra on PIN Codes but it won't make any sense. Also attribute SEX could be male or female it is also a kind of nominal attribute.
thanks
Related
I'm stuck with a problem statement of predicting an identifier for a product on the basis of couple of product features. A sample of data available to me looks like the one shown below:
ABC10L 20.0 34 XYZ G345F FG MKD -> 000000DEF_VYA
Here, ABC10L,20.0,34,XYZ,G345F,FG,MKD are the features and 000000DEF_VYA is the unique identifier associated with the product. Initially I tried to formulate this problem as a regression problem but I'm not sure how to generate textual output from my model and what should be my cost function. Also, I'm not sure is regression the right tool to solve the issue here.
Please help in suggesting the right approach and how I may proceed to solve this !!!
I am currently integrating the BERT model listed on https://developer.apple.com/machine-learning/models/#text into an iOS application and have had difficulty removing answers that have low certainty.
I have used the sample code found at the link above but because I wanted to answer questions based on larger volumes of text, I loop over an array of paragraphs and predict an answer for each one. However, the model does not return nil or "No Answer" if an answer is not found and instead returns a (seemingly) random substring. I suppose what I am trying to ask is: is it possible to access the certainty of BERT's response to filter out unlikely results? Or is there another way to get BERT to only return results above a set certainty threshold?
After hours of searching, I've now found a solution. Ironically it only took three lines of code, but here it is anyway:
if bestSum < 7.5 {
return nil
}
I implemented this in the findBestLogitPair() method in the BERTOutput.swift file as provided in Apple's sample code for text analysis using BERT. I have now discovered that the word logit does kind of mean probability in statistics - but being a programmer, I had no idea!
I'm using Z3 with the ml interface. I had created a formula
f(x_i)
that is satisfiable, according to the solver
Solver.mk_simple_solver ctxr.
The problem is: I can get a model, but he find me values only for some variables of the formula, and not all (some of my Model.get_const_interp_er end with a type None)
How can it be possible that the model can give me only a part of the x_ir? In my understanding, if the model work for one of the values, it means that the formula was satisfiable (in my case, it is) and so all the values can be given...
I don't understand something..
Thanks for reading me!
You should always post full examples so people can help with actual coding issues; without seeing your actual code, it's impossible to know what might be the actual reason.
Having said that, this sounds very much like the following question: Why Z3Py does not provide all possible solutions So, perhaps the answer given there will help you.
Long story short: Z3 models will only contain values for variables that matter for the model. For anything that is not explicitly assigned, any value will do. There are ways to get "full" models as explained in that answer of course; which I'm sure is also possible from the ML interface.
I'm trying to compare J48 and MLP on a variety of datasets using WEKA. One of these is: https://archive.ics.uci.edu/ml/datasets/primary+tumor. I have converted this to CSV form which can be easily imported into WEKA. You can download this file here: https://ufile.io/8nj13
I used the "numeric to nominal" on the class and all the attributes to fit the natural structure of the data. However, when I ran J48 (and MLP), I got a bunch of question marks "?" in my output, presumably due to not having enough observations/instances of the appropriate type.
How can I get around this? I'm sure there must be a filter for this kind of thing. I've attached a picture below.
The detailed accuracy table is displaying a question mark since no instance was actually classified as that specific class. This for example means that since no instance was classified as class 16, WEKA can not provide you with details regarding said class 16 classifications. This image might help you understand.
In regards to the amount of instances of the appropriate class, you can use the ClassBalancer filter under, found at weka/filters/supervised/instance/ClassBalancer. This should help balance out the amount of the various classes.
Also note that your dataset contains some missing values, this could be solved by either discarding the instances with missing data or running the ReplaceMissingValues filter, found at weka/filters/unsupervised/attribute/ReplaceMissingValues.
So i've wondered if there would be a way to tokenize/tag TV or Movie Files using NLP/Machine Learing.
I know there are a lot of regexp approaches out there which do this already but shouldn't it be possible to get this done with NLP/Machine Learning as well?
Example:
The.Heart.Guy.S01E07.Die.Belastungsprobe.German.DL.720p.HDTV.x264-GDR
Should be something like:
The Heart Guy SHOW-NAME
1 SEASON
7 EPISODE
Die Belastungsprobe EP-NAME
German DL LANGUAGE
720p RESOLUTION
HDTV SOURCE
x264 CODEC
GDR GROUP
Anyone ever tried something like this? Or any hints where one should start or if it's even possible to get something like this working.
Machine learning approaches would cost more than rule-based approaches. But if you want to try a machine learning solution the best solution that comes to my mind is to use markov models as the problem has sequential observations and you can handle it with finite state automatas. You can use this paper as a reference.
I suspect using regexes is the easiest solution to this, but if you're willing to put in some time Conditional Random Fields are also a great solution. Here's an article about the New York Times using a CRF based model on recipe data.
Another example of CRFs on short text is libpostal, which extracts parts of postal addresses.