Using SVM to predict text with label

Using SVM to predict text with label - machine-learning

I have data in a csv file in the following format
Name Power Money
Jon Red 30
George blue 20
Tom Red 40
Bob purple 10
I consider values like "jon", "red" and "30 as inputs. Each input as a label. For instance inputs [jon,george,tom,bob] have label "name". Inputs [red,blue,purple] have label "power". This is basically how I have training data. I have bunch of values that are each mapped to a label.
Now I want to use svm to train a model based on my training data to accurately identify given a new input what is its correct label. so for instance if the input provided is "444" , the model should be smart enough to categorize it as a "Money" label.
I have installed py and also installed sklearn. I have completed the following tutorial as well. I am just not sure on how to prepare input data to train the model.
Also I am new to machine learning if i have said something that sounds wrong or odd please point it out as I will be happy to learn the correct.

With how your current question is formulated, you are not dealing with a typical machine learning problem. Currently, you have column-wise data:
Name Power Money
Jon Red 30
George blue 20
Tom Red 40
Bob purple 10
If a user now inputs "Jon", you know it is going to be type "Name", by a simple hash-map look up, e.g.,:
hashmap["Jon"] -> "Name"
The main reason people are saying it is not a machine-learning problem is your "categorisation" or "prediction" is being defined by your column names. Machine learning problems, instead (typically), will be predicting some response variable. For example, imagine instead you had asked this:
Name Power Money Bought_item
Jon Red 30 yes
George blue 20 no
Tom Red 40 no
Bob purple 10 yes
We could build a model to predict Bought_item using the features Name, Power, and Money using SVM.
Your problem would have to look more like:
Feature1 Feature2 Feature3 Category
1.0 foo bar Name
3.1 bar foo Name
23.4 abc def Money
22.22 afb dad Power
223.1 dad vxv Money
You then use Feature1, Feature2, and Feature3 to predict Category. At the moment your question does not give enough information for anyone to really understand what you need or what you have to reformulate it this way, or consider an unsupervised approach.
Edit:
So frame it this way:
Name Power Money Label
Jon Red 30 Foo
George blue 20 Bar
Tom Red 40 Foo
Bob purple 10 Bar
OneHotEncode Name and Power, so you now have a variable for each name that can be 0/1.
Standardise Money so that it has a range between, approximately, -1/1.
LabelEncode your labels so that they are 0,1,2,3,4,5,6 and so on.
Use a One vs. All classifier, http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html.

Related

Sequence Models Word2vec

I am working on data-set with more than 100,000 records.
This is how the data looks like:
email_id cust_id campaign_name
123 4567 World of Zoro
123 4567 Boho XYz
123 4567 Guess ABC
234 5678 Anniversary X
234 5678 World of Zoro
234 5678 Fathers day
234 5678 Mothers day
345 7890 Clearance event
345 7890 Fathers day
345 7890 Mothers day
345 7890 Boho XYZ
345 7890 Guess ABC
345 7890 Sale
I am trying to understand the campaign sequence and predict the next possible campaign for the customers.
Assume I have processed my data and stored it in 'camp'.
With Word2Vec-
from gensim.models import Word2Vec
model = Word2Vec(sentences=camp, size=100, window=4, min_count=5, workers=4, sg=0)
The problem with this model is that it accepts tokens and spits out text-tokens with probabilities in return when looking for similarities.
Word2Vec accepts this form of input-
['World','of','Zoro','Boho','XYZ','Guess','ABC','Anniversary','X'...]
And gives this form of output -
model.wv.most_similar('Zoro')
[Guess,0.98],[XYZ,0.97]
Since I want to predict campaign sequence, I was wondering if there is anyway I can give below input to the model and get the campaign name in the output
My input to be as -
[['World of Zoro','Boho XYZ','Guess ABC'],['Anniversary X','World of
Zoro','Fathers day','Mothers day'],['Clearance event','Fathers day','Mothers
day','Boho XYZ','Guess ABC','Sale']]
Output -
model.wv.most_similar('World of Zoro')
[Sale,0.98],[Mothers day,0.97]
I am also not sure if there is any functionality within the Word2Vec or any similar algorithms which can help predicting campaigns for individual users.
Thank you for your help.

I don't believe that word2vec is the right approach to model your problem.
Word2vec uses two possible approaches: Skip-gram (given a target word predict its surrounding words) or CBOW (given the surrounding words predict the target word). Your case is similar to the context of CBOW, but there is no reason why the phenomenon that you want to model would respect the linguistic "rules" for which word2vec has been developed.
word2vec tends to predict the word that occurs more frequently in combination with the targeted one within the moving window (in your code: window=4). So it won't predict the best possible next choice but the one that occurred most often in the window span of the given word.
In your call to word2vec (Word2Vec(sentences=camp, size=100, window=4, min_count=5, workers=4, sg=0)) you are also using min_count=5 so the model is ignoring the words that have a frequency less than 5. Depending on your dataset size, there could be a loss of relevant information.
I suggest to give a look to forecasting techniques and time series analysis methods. I have the feeling that you will obtain better prediction using these techniques rather word2vec. (https://otexts.org/fpp2/index.html)
I hope it helps

How can a machine learning model handle unseen data and unseen label?

I am trying to solve a text classification problem. I have a limited number of labels that capture the category of my text data. If the incoming text data doesn't fit any label, it is tagged as 'Other'. In the below example, I built a text classifier to classify text data as 'breakfast' or 'italian'. In the test scenario, I included couple of text data that do not fit into the labels that I used for training. This is the challenge that I'm facing. Ideally, I want the model to say - 'Other' for 'i like hiking' and 'everyone should understand maths'. How can I do this?
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfTransformer
X_train = np.array(["coffee is my favorite drink",
"i like to have tea in the morning",
"i like to eat italian food for dinner",
"i had pasta at this restaurant and it was amazing",
"pizza at this restaurant is the best in nyc",
"people like italian food these days",
"i like to have bagels for breakfast",
"olive oil is commonly used in italian cooking",
"sometimes simple bread and butter works for breakfast",
"i liked spaghetti pasta at this italian restaurant"])
y_train_text = ["breakfast","breakfast","italian","italian","italian",
"italian","breakfast","italian","breakfast","italian"]
X_test = np.array(['this is an amazing italian place. i can go there every day',
'i like this place. i get great coffee and tea in the morning',
'bagels are great here',
'i like hiking',
'everyone should understand maths'])
classifier = Pipeline([
('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', MultinomialNB())])
classifier.fit(X_train, y_train_text)
predicted = classifier.predict(X_test)
proba = classifier.predict_proba(X_test)
print(predicted)
print(proba)
['italian' 'breakfast' 'breakfast' 'italian' 'italian']
[[0.25099411 0.74900589]
[0.52943091 0.47056909]
[0.52669142 0.47330858]
[0.42787443 0.57212557]
[0.4 0.6 ]]
I consider the 'Other' category as noise and I cannot model this category.

I think Kalsi might have suggested this but it was not clear to me. You could define a confidence threshold for your classes. If the predicted probability does not achieve the threshold for any of your classes ('italian' and 'breakfast' in your example), you were not able to classify the sample yielding the 'other' "class".
I say "class" because other is not exactly a class. You probably don't want your classifier to be good at predicting "other" so this confidence threshold might be a good approach.

You cannot do that.
You have trained the model to predict only two labels i.e., breakfast or italian. So the model doesn't have any idea about the third label or the fourth etc.
You and me know that "i like hiking" is neither breakfast nor italian. But how a model a would know that ? It only knows breakfast & italian. So there has to be a way to tell the model that: If you get confused between breakfast &italian, then predict the label as other
You can achieve this by training the model which is having other as label with some texts like "i like hiking" etc
But in your case, a little hack can be done as follows.
So what does it mean when a model predicts a label with 0.5 probability (or approximately 0.5)? It means that model is getting confused between the labels breakfast and italian. So here you can take advantage of this.
You can take all the predicted probability values & assign the label other if the probability value is between 0.45 & 0.55 . In this way you can predict the other label (obviously with some errors) without letting the model knowing there is a label called other

You can try setting class priors when creating the MultinomialNB. You could create a dummy "Other" training example, and then set the prior high enough for Other so that instances default to Other when there aren't enough evidence to select the other classes.

No, you cannot do that.
You have to define a third category "other" or whatever name that suits you and give your model some data related to that category. Make sure that number of training examples for all three categories are somewhat equal, otherwise "other" being a very broad category could skew your model towards "other" category.
Other way to approach this, is to get noun phrases from all your sentences for different categories including other and then feed into the model, consider this as a feature selection step for your machine learning model. In this way noise added by irrelevant words will be removed, better performance than tf-idf.
If you have huge data, go for deep learning models which does feature selection automatically.
Dont go with manipulating probabilities by yourself approach, 50-50% probability means that the model is confused between two classes which you have defined, it has no idea about the third "other class".
Lets say the sentence is "I want italian breakfast", the model will be confused whether this sentence belongs to "italian" or "breakfast" category but that doesnt mean it belongs to "other" category".

Estimating both the category and the magnitude of output using neural networks

Let's say I want to calculate which courses a final year student will take and which grades they will receive from the said courses. We have data of previous students'courses and grades for each year (not just the final year) to train with. We also have data of the grades and courses of the previous years for students we want to estimate the results for. I want to use a recurrent neural network with long-short term memory to solve this problem. (I know this problem can be solved by regression, but I want the neural network specifically to see if this problem can be properly solved using one)
The way I want to set up the output (label) space is by having a feature for each of the possible courses a student can take, and having a result between 0 and 1 in each of those entries to describe whether if a student will attend the class (if not, the entry for that course would be 0) and if so, what would their mark be (ie if the student attends class A and gets 57%, then the label for class A will have 0.57 in it)
Am I setting the output space properly?
If yes, what optimization and activation functions I should use?
If no, how can I re-shape my output space to get good predictions?

If I understood you correctly, you want that the network is given the history of a student, and then outputs one entry for each course. This entry is supposed to simultaneously signify whether the student will take the course (0 for not taking the course, 1 for taking the course), and also give the expected grade? Then the interpretation of the output for a single course would be like this:
0.0 -> won't take the course
0.1 -> will take the course and get 10% of points
0.5 -> will take the course and get half of points
1.0 -> will take the course and get full points
If this is indeed your plan, I would definitely advise to rethink it.
Some obviously realistic cases do not fit into this pattern. For example, how would you represent an (A+)-student is "unlikely" to take a course? Should the network output 0.9999, because (s)he is very likely to get the maximum amount of points if (s)he takes the course, OR should the network output 0.0001, because the student is very unlikely to take the course?
Instead, you should output two values between [0,1] for each student and each course.
First value in [0, 1] gives the probability that the student will participate in the course
Second value in [0, 1] gives the expected relative number of points.
As loss, I'd propose something like binary cross-entropy on the first value, and simple square error on the second, and then combine all the losses using some L^p metric of your choice (e.g. simply add everything up for p=1, square and add for p=2).
Few examples:
(0.01, 1.0) : very unlikely to participate, would probably get 100%
(0.5, 0.8): 50%-50% whether participates or not, would get 80% of points
(0.999, 0.15): will participate, but probably pretty much fail
The quantity that you wanted to output seemed to be something like the product of these two, which is a bit difficult to interpret.

There is more than one way to solve this problem. Andrey's answer gives a one good approach.
I would like to suggest simplifying the problem by bucketing grades into categories and adding an additional category for "did not take", for both input and output.
This turns the task into a classification problem only, and solves the issue of trying to differentiate between receiving a low grade and not taking the course in your output.
For example your training set might have m students, n possible classes, and six possible results: ['A', 'B', 'C', 'D', 'F', 'did_not_take'].
And you might choose the following architecture:
Input -> Dense Layer -> RELU -> Dense Layer -> RELU -> Dense Layer -> Softmax
Your input shape is (m, n, 6) and your output shape could be (m, n*6), where you apply softmax for every group of 6 outputs (corresponding to one class) and sum into a single loss value. This is an example of multiclass, multilabel classification.
I would start by trying 2n neurons in each hidden layer.
If you really want a continuous output for grades, however, then I recommend using separate classification and regression networks. This way you don't have to combine classification and regression loss into one number, which can get messy with scaling issues.
You can keep the grade buckets for input data only, so the two networks take the same input data, but for the grade regression network your last layer can be n sigmoid units with log loss. These will output numbers between 0 and 1, corresponding the predicted grade for each class.
If you want to go even further, consider using an architecture that considers the order in which students took previous classes. For example if a student took French I the previous year, it is more likely he/she will take French II this year than if he/she took French Freshman year and did not continue with French after that.

What is different between RNN output and rule based output?

I am new in machine learning and i got a question , I am following this tutorial , I read about LSTM and RNN. i use the code provided by tutorial and run it , it completed the training and now i gave some strings for testing :
Training data is this :
output is :
Iter= 20000, Average Loss= 0.531466, Average Accuracy= 84.60%
['the', 'sly', 'and'] - [treacherous] vs [treacherous]
Optimization Finished!
Elapsed time: 12.159853319327036 min
Run on command line.
tensorboard --logdir=/tmp/tensorflow/rnn_words
Point your web browser to: http://localhost:6006/
3 words: ,hello wow and
Word not in dictionary
3 words: mouse,mouse,mouse
3 words: mouse
3 words: mouse mouse mouse
mouse mouse mouse very well , but who is to bell the cat approaches the until will at one another and take mouse a receive some signal of her approach , we he easily escape
3 words: 3 words: had a general
had a general to proposal to make round the neck will all agree , said he easily at and enemy approaches to consider what common the case . you will all agree , said he
3 words: mouse mouse mouse
mouse mouse mouse very well , but who is to bell the cat approaches the until will at one another and take mouse a receive some signal of her approach , we he easily escape
3 words: what was cat
what was cat up and said he is all very well , but who is to bell the cat approaches the until will at one another and take mouse a receive some signal of her
3 words: mouse fear cat
Word not in dictionary
3 words: mouse tell cat
Word not in dictionary
mo3 words: mouse said cat
Word not in dictionary
3 words: mouse fear fear
Word not in dictionary
3 words: mouse ring bell
Word not in dictionary
m3 words: mouse ring ring
Word not in dictionary
3 words: mouse bell bell
mouse bell bell and general to make round the neck will all agree , said he easily at and enemy approaches to consider what common the case . you will all agree , said he
3 words: mouse and bell
mouse and bell this means we should always , but looked is young always , but looked is young always , but looked is young always , but looked is young always , but looked
3 words: mouse was bell
mouse was bell and said he is all very well , but who is to bell the cat approaches the until will at one another and take mouse a receive some signal of her approach
3 words:
Now what i am not getting , When i give three words it gives result something like which we can easily achieve via regular expression or rule based code using if-else like if input words in file then fetch some sentence previous or next sentences , What is special about this output , How its different ? Explain please
like sometimes it says , word not in dict , so if i have to give only those words which are in training file then its like its matching inout words in training data and fetching some result from file then we can do same thing with if else or in pure programming without any module then how's its different ?

Your training dataset only has ~180 words, and is achieving a 84.6% (training) accuracy, so it is overfitting quite a bit. Essentially, the model is simply predicting the next most likely word based on the training data.
Usually language models are trained on much larger datasets, such as PTB or the 1B word benchmark. PTB is a small dataset, with 100,000 words, and the 1B word benchmark has 1 billion words.
RNN models have a limited vocabulary to allow words or characters to be encoded. The vocabulary size would depend on model. Most word models that train on PTB have a vocabulary size of 10,000, which is enough for most common words.

What is the difference between a feature and a label? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I'm following a tutorial about machine learning basics and there is mentioned that something can be a feature or a label.
From what I know, a feature is a property of data that is being used. I can't figure out what the label is, I know the meaning of the word, but I want to know what it means in the context of machine learning.

Briefly, feature is input; label is output. This applies to both classification and regression problems.
A feature is one column of the data in your input set. For instance, if you're trying to predict the type of pet someone will choose, your input features might include age, home region, family income, etc. The label is the final choice, such as dog, fish, iguana, rock, etc.
Once you've trained your model, you will give it sets of new input containing those features; it will return the predicted "label" (pet type) for that person.

Feature:
In Machine Learning feature means property of your training data. Or you can say a column name in your training dataset.
Suppose this is your training dataset
Height Sex Age
61.5 M 20
55.5 F 30
64.5 M 41
55.5 F 51
. . .
. . .
. . .
. . .
Then here Height, Sex and Age are the features.
label:
The output you get from your model after training it is called a label.
Suppose you fed the above dataset to some algorithm and generates a model to predict gender as Male or Female, In the above model you pass features like age, height etc.
So after computing, it will return the gender as Male or Female. That's called a Label

Here comes a more visual approach to explain the concept. Imagine you want to classify the animal shown in a photo.
The possible classes of animals are e.g. cats or birds.
In that case the label would be the possible class associations e.g. cat or bird, that your machine learning algorithm will predict.
The features are pattern, colors, forms that are part of your images e.g. furr, feathers, or more low-level interpretation, pixel values.
Label: Bird
Features: Feathers
Label: Cat
Features: Furr

Prerequisite: Basic Statistics and exposure to ML (Linear Regression)
It can be answered in a sentence -
They are alike but their definition changes according to the necessities.
Explanation
Let me explain my statement. Suppose that you have a dataset, for this purpose consider exercise.csv. Each column in the dataset are called as features. Gender, Age, Height, Heart Rate, Body_temp, and Calories might be one among various columns. Each column represents distinct features or property.
exercise.csv
User_ID Gender Age Height Weight Duration Heart_Rate Body_Temp Calories
14733363 male 68 190.0 94.0 29.0 105.0 40.8 231.0
14861698 female 20 166.0 60.0 14.0 94.0 40.3 66.0
11179863 male 69 179.0 79.0 5.0 88.0 38.7 26.0
To solidify the understanding and clear out the puzzle let us take two different problems (prediction case).
CASE1: In this case we might consider using - Gender, Height, and Weight to predict the Calories burnt during exercise. That prediction(Y) Calories here is a Label. Calories is the column that you want to predict using various features like - x1: Gender, x2: Height and x3: Weight .
CASE2: In the second case here we might want to predict the Heart_rate by using Gender and Weight as a feature. Here Heart_Rate is a Label predicted using features - x1: Gender and x2: Weight.
Once you have understood the above explanation you won't really be confused with Label and Features anymore.

Let's take an example where we want to detect the alphabet using handwritten photos. We feed these sample images in the program and the program classifies these images on the basis of the features they got.
An example of a feature in this context is: the letter 'C' can be thought of like a concave facing right.
A question now arises as to how to store these features. We need to name them. Here's the role of the label that comes into existence. A label is given to such features to distinguish them from other features.
Thus, we obtain labels as output when provided with features as input.
Labels are not associated with unsupervised learning.

A feature briefly explained would be the input you have fed to the system and the label would be the output you are expecting. For example, you have fed many features of a dog like his height, fur color, etc, so after computing, it will return the breed of the dog you want to know.

Suppose you want to predict climate then features given to you would be historic climate data, current weather, temperature, wind speed, etc. and labels would be months.
The above combination can help you derive predictions.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart