Neural network multiple choice exams - machine-learning

How likely is it to succeed in training neural network (e.g. simple feedforward/backprop multilayer perceptron) to solve multiple choice (text based) questions - and if low likelihood - what would be smarter ways to go (or don't go) about this problem?
Here's more information on the multiple choice exam structure:
5 lines of text
1/5 answers (1-2 lines of text each) are correct
also some more assumptions:
results/feedback immediately displayed
the training data is over 5'000 questions

In my opinion this problem is extremely difficult to solve. Basically, you are trying to teach a neural network to understand a natural language. Obviously, there were many attempts to solve this task but no significant success yet.
This may be possible (but still unlikely) only if exam questions are very simple, highly specialized and have some special common structure.
Also, 5000 questions sample seems pretty small for this task.

Related

How to give a logical reason for choosing a model

I used machine learning to train depression related sentences. And it was LinearSVC that performed best. In addition to LinearSVC, I experimented with MultinomialNB and LogisticRegression, and I chose the model with the highest accuracy among the three. By the way, what I want to do is to be able to think in advance which model will fit, like ml_map provided by Scikit-learn. Where can I get this information? I searched a few papers, but couldn't find anything that contained more detailed information other than that SVM was suitable for text classification. How do I study to get prior knowledge like this ml_map?
How do I study to get prior knowledge like this ml_map?
Try to work with different example datasets on different data types by using different algorithms. There are hundreds to be explored. Once you get the good grasp of how they work, it will become more clear. And do not forget to try googling something like advantages of algorithm X, it helps a lot.
And here are my thoughts, I think I used to ask such questions before and I hope it can help if you are struggling: The more you work on different Machine Learning models for a specific problem, you will soon realize that data and feature engineering play the more important parts than the algorithms themselves. The road map provided by scikit-learn gives you a good view of what group of algorithms to use to deal with certain types of data and that is a good start. The boundaries between them, however, are rather subtle. In other words, one problem can be solved by different approaches depending on how you organize and engineer your data.
To sum it up, in order to achieve a good out-of-sample (i.e., good generalization) performance while solving a problem, it is mandatory to look at the training/testing process with different setting combinations and be mindful with your data (for example, answer this question: does it cover most samples in terms of distribution in the wild or just a portion of it?)

What is the difference between machine learning and deep learning in building a chatbot?

To be more specific, The traditional chatbot framework consists of 3 components:
NLU (1.intent classification 2. entity recognition)
Dialogue Management (1. DST 2. Dialogue Policy)
NLG.
I am just confused that If I use a deep learning model(seq2seq, lstm, transformer, attention, bert…) to train a chatbot, Is it cover all those 3 components? If so, could you explain more specifically how it related to those 3 parts? If not, how can I combine them?
For example, I have built a closed-domain chatbot, but it is only task-oriented which cannot handle the other part like greeting… And it can’t handle the problem of Coreference Resolution (it seems doesn't have Dialogue Management).
It seems like your question can be split into two smaller questions:
What is the difference between machine learning and deep learning?
How does deep learning factor into each of the three components of chatbot frameworks?
For #1, deep learning is an example of machine learning. Think of your task as a graphing problem. You transform your data so it has an n-dimensional representation on a plot. The goal of the algorithm is to create a function that represents a line drawn on the plot that (ideally) cleanly separates the points from one another. Each sector of the graph represents whatever output you want (be it a class/label, related words, etc). Basic machine learning creates a line on a 'linearly separable' problem (i.e. it's easy to draw a line that cleanly separates the categories). Deep learning enables you to tackle problems where the line might not be so clean by creating a really, really, really complex function. To do this, you need to be able to introduce multiple dimensions to the mapping function (which is what deep learning does). This is a very surface-level look at what deep learning does, but that should be enough to handle the first part of your question.
For #2, a good quick answer for you is that deep learning can be a part of each component of the chatbot framework depending on how complex your task is. If it's easy, then classical machine learning might be good enough to solve your problem. If it's hard, then you can begin to look into deep learning solutions.
Since it sounds like you want the chatbot to go a bit beyond simple input-output matching and handle complicated semantics like coreference resolution, your task seems sufficiently difficult and a good candidate for a deep learning solution. I wouldn't worry so much about identifying a specific solution for each of the chatbot framework steps because the tasks involved in each of those steps blend into one another with deep learning (e.g. a deep learning solution wouldn't need to classify intent and then manage dialogue, it would simply learn from hundreds of thousands of similar situations and apply a variation of the most similar response).
I would recommend handling the problem as a translation problem - but instead of translating from one language to another, you're translating from the input query to the output response. Translation frequently needs to resolve coreference and solutions people have used to solve that might be an ideal course of action for you.
Here are some excellent resources to read up on in order to frame your problem and how to solve it:
Google's Neural Machine Translation
Fine Tuning Tasks with BERT
There is always a trade-off between using traditional machine learning models and using deep learning models.
Deep learning models require large data to train and there will be an increase in training time & testing time. But it will give better results.
Traditional ML models work well with fewer data with moderate performance comparatively. The inference time is also less.
For Chatbots, latency matters a lot. And the latency depends on the application/domain.
If the domain is banking or finance, people are okay with waiting for a few seconds but they are not okay with wrong results. On the other hand in the entertainment domain, you need to deliver the results at the earliest.
The decision depends on the application domain + the data size you are having + the expected precision.
RASA is something worth looking into.

NLP for reliable text classification on raspberry pi

Trying to get up and running my very own smart room.
As of now the system is on raspi 3, Google STT, naive bayes for text classification, PoS/NER by nlp-compromise, bunch of APIs, and then eSpeak. (sure there are lot of other stages, but generally speaking)
One thing which is problematic though is the text classification. Though, NB is doing a fair job but yeah there are issues.
Various text classification heavily rely on the fact that there would be large corpora to train with. And this makes sense, particularly if the application is news categorisation, for example.
But here we are talking about spoken language. If the sentence is Tell me the weather, there's only so much corpus you can generate for the variation in that simple statement. And still, find some other way to ask for the weather.
I don't think for each category there can be a large datasets of statements which would help to make the device clearly distinguish between commands.
Question
What can I do here, since more categories (or skillsets) would mean more similar statements.
Since it is a classification problem, even using SVM or RNN or any other trick should not make any such difference, even if I have to rig an external GPU for it. The corpus is about spoken sentences for various categories and the dataset can't be expected to be diversely educative for the system.
But honestly I am not clear of what could be a reliable method for classification, only for such specific purposes.
PS - I have seen how Jasper works, but even that does not resolve to better "understanding" of categories, many times

Parsing nonuniform data

I am trying to parse a collection of data that has two (or one) useful pieces, but may be organized in many different ways:
V01C01
Vol 1 Chapter 1
Chapter 1 Volume 1 - Alt title
V1.1
etc.
I don't want to use a massive collection of regexs, because there is no way to predict all of the combinations of how things will be organized (also some will have extraneous text). I feel like there is a branch of machine learning that may be perfect for this, but I'm not experienced in it enough to know.
Well that is an interesting problem for sure and there are a couple of things you could try.
Making the assumption that you don't have labels on your data, then the first thing I would try to do, is to check the connections between each instance using a clustering algorithm like k-means (http://en.wikipedia.org/wiki/K-means_clustering), keep in mind that this wouldn't solve your problem but would help you to explore your data and hopefully find a set of features to train a supervised learning classifier.
In the case that you do have labels on your data, or you could manually tag your set. Then you are in front a more manageable problem. At first glance, it would look a lot like a text or document classification problem (like classify emails as Spam/NoSpam), in which case a naive bayes classifier could be a good first attempt to attack the problem since is a easy algorithm to implement and can provide reasonable good results.
About Naives Bayes Classifier (https://www.bionicspirit.com/blog/2012/02/09/howto-build-naive-bayes-classifier.html)
I made some assumptions here and I might be wrong based on that. Maybe if you clarify some points (like if you are able to manually tag the data) we would be able to help you further.

Best approach to what I think is a machine learning problem [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am wanting some expert guidance here on what the best approach is for me to solve a problem. I have investigated some machine learning, neural networks, and stuff like that. I've investigated weka, some sort of baesian solution.. R.. several different things. I'm not sure how to really proceed, though. Here's my problem.
I have, or will have, a large collection of events.. eventually around 100,000 or so. Each event consists of several (30-50) independent variables, and 1 dependent variable that I care about. Some independent variables are more important than others in determining the dependent variable's value. And, these events are time relevant. Things that occur today are more important than events that occurred 10 years ago.
I'd like to be able to feed some sort of learning engine an event, and have it predict the dependent variable. Then, knowing the real answer for the dependent variable for this event (and all the events that have come along before), I'd like for that to train subsequent guesses.
Once I have an idea of what programming direction to go, I can do the research and figure out how to turn my idea into code. But my background is in parallel programming and not stuff like this, so I'd love to have some suggestions and guidance on this.
Thanks!
Edit: Here's a bit more detail about the problem that I'm trying to solve: It's a pricing problem. Let's say that I'm wanting to predict prices for a random comic book. Price is the only thing I care about. But there are lots of independent variables one could come up with. Is it a Superman comic, or a Hello Kitty comic. How old is it? What's the condition? etc etc. After training for a while, I want to be able to give it information about a comic book I might be considering, and have it give me a reasonable expected value for the comic book. OK. So comic books might be a bogus example. But you get the general idea. So far, from the answers, I'm doing some research on Support vector machines and Naive Bayes. Thanks for all of your help so far.
Sounds like you're a candidate for Support Vector Machines.
Go get libsvm. Read "A practical guide to SVM classification", which they distribute, and is short.
Basically, you're going to take your events, and format them like:
dv1 1:iv1_1 2:iv1_2 3:iv1_3 4:iv1_4 ...
dv2 1:iv2_1 2:iv2_2 3:iv2_3 4:iv2_4 ...
run it through their svm-scale utility, and then use their grid.py script to search for appropriate kernel parameters. The learning algorithm should be able to figure out differing importance of variables, though you might be able to weight things as well. If you think time will be useful, just add time as another independent variable (feature) for the training algorithm to use.
If libsvm can't quite get the accuracy you'd like, consider stepping up to SVMlight. Only ever so slightly harder to deal with, and a lot more options.
Bishop's Pattern Recognition and Machine Learning is probably the first textbook to look to for details on what libsvm and SVMlight are actually doing with your data.
If you have some classified data - a bunch of sample problems paired with their correct answers -, start by training some simple algorithms like K-Nearest-Neighbor and Perceptron and seeing if anything meaningful comes out of it. Don't bother trying to solve it optimally until you know if you can solve it simply or at all.
If you don't have any classified data, or not very much of it, start researching unsupervised learning algorithms.
It sounds like any kind of classifier should work for this problem: find the best class (your dependent variable) for an instance (your events). A simple starting point might be Naive Bayes classification.
This is definitely a machine learning problem. Weka is an excellent choice if you know Java and want a nice GPL lib where all you have to do is select the classifier and write some glue. R is probably not going to cut it for that many instances (events, as you termed it) because it's pretty slow. Furthermore, in R you still need to find or write machine learning libs, though this should be easy given that it's a statistical language.
If you believe that your features (independent variables) are conditionally independent (meaning, independent given the dependent variable), naive Bayes is the perfect classifier, as it is fast, interpretable, accurate and easy to implement. However, with 100,000 instances and only 30-50 features you can likely implement a fairly complex classification scheme that captures a lot of the dependency structure in your data. Your best bet would probably be a support vector machine (SMO in Weka) or a random forest (Yes, it's a silly name, but it helped random forest catch on.) If you want the advantage of easy interpretability of your classifier even at the expense of some accuracy, maybe a straight up J48 decision tree would work. I'd recommend against neural nets, as they're really slow and don't usually work any better in practice than SVMs and random forest.
The book Programming Collective Intelligence has a worked example with source code of a price predictor for laptops which would probably be a good starting point for you.
SVM's are often the best classifier available. It all depends on your problem and your data. For some problems other machine learning algorithms might be better. I have seen problems that neural networks (specifically recurrent neural networks) were better at solving. There is no right answer to this question since it is highly situationally dependent but I agree with dsimcha and Jay that SVM's are the right place to start.
I believe your problem is a regression problem, not a classification problem. The main difference: In classification we are trying to learn the value of a discrete variable, while in regression we are trying to learn the value of a continuous one. The techniques involved may be similar, but the details are different. Linear Regression is what most people try first. There are lots of other regression techniques, if linear regression doesn't do the trick.
You mentioned that you have 30-50 independent variables, and some are more important that the rest. So, assuming that you have historical data (or what we called a training set), you can use PCA (Principal Componenta Analysis) or other dimensionality reduction methods to reduce the number of independent variables. This step is of course optional. Depending on situations, you may get better results by keeping every variables, but add a weight to each one of them based on relevant they are. Here, PCA can help you to compute how "relevant" the variable is.
You also mentioned that events that are occured more recently should be more important. If that's the case, you can weight the recent event higher and the older event lower. Note that the importance of the event doesn't have to grow linearly accoding to time. It may makes more sense if it grow exponentially, so you can play with the numbers here. Or, if you are not lacking of training data, perhaps you can considered dropping off data that are too old.
Like Yuval F said, this does look more like a regression problem rather than a classification problem. Therefore, you can try SVR (Support Vector Regression), which is regression version of SVM (Support Vector Machine).
some other stuff you can try are:
Play around with how you scale the value range of your independent variables. Say, usually [-1...1] or [0...1]. But you can try other ranges to see if they help. Sometimes they do. Most of the time they don't.
If you suspect that there are "hidden" feature vector with a lower dimension, say N << 30 and it's non-linear in nature, you will need non-linear dimensionality reduction. You can read up on kernel PCA or more recently, manifold sculpting.
What you described is a classic classification problem. And in my opinion, why code fresh algorithms at all when you have a tool like Weka around. If I were you, I would run through a list of supervised learning algorithms (I don't completely understand whey people are suggesting unsupervised learning first when this is so clearly a classification problem) using 10-fold (or k-fold) cross validation, which is the default in Weka if I remember, and see what results you get! I would try:
-Neural Nets
-SVMs
-Decision Trees (this one worked really well for me when I was doing a similar problem)
-Boosting with Decision trees/stumps
-Anything else!
Weka makes things so easy and you really can get some useful information. I just took a machine learning class and I did exactly what you're trying to do with the algorithms above, so I know where you're at. For me the boosting with decision stumps worked amazingly well. (BTW, boosting is actually a meta-algorithm and can be applied to most supervised learning algs to usually enhance their results.)
A nice thing aobut using Decision Trees (if you use the ID3 or similar variety) is that it chooses the attributes to split on in order of how well they differientiate the data - in other words, which attributes determine the classification the quickest basically. So you can check out the tree after running the algorithm and see what attribute of a comic book most strongly determines the price - it should be the root of the tree.
Edit: I think Yuval is right, I wasn't paying attention to the problem of discretizing your price value for the classification. However, I don't know if regression is available in Weka, and you can still pretty easily apply classification techniques to this problem. You need to make classes of price values, as in, a number of ranges of prices for the comics, so that you can have a discrete number (like 1 through 10) that represents the price of the comic. Then you can easily run classification it.

Resources