Learning approach in machine learning [closed] - machine-learning

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
(homework problem)
Which of the following problems are best suited for the learning approach?
Classifying numbers into primes and non-primes.
Detecting potential fraud in credit card charges.
Determining the time it would take a falling object to hit the ground.
Determining the optimal cycle for trafic lights in a busy intersection

I'm trying to answer your question without doing your homework.
Basically you can think of machine learning as a way to extract patterns from data where all other approaches fail.
So first clue here: If there is an analytic way to solve the problem then don't use machine learning! The analytic algorithm will likely be faster, more efficient, and 100% correct.
Second clue is: There has to be a pattern in the data. If you as a human see a pattern, machine learning can find it too. If lots of smart humans who are experts of the respective domain don't see a pattern then machine learning will most likely fail. Chaos can not be learned, i.e. classified/predicted.
That should answer your question. Make sure to also read the summary on wikipedia to get an idea whether a problem can be solved using supervised, unsupervised, or reinforcement learning.

Related

ML or rule based [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I already have 85 accuracy on my sklearn text classifier. What are the advantages and disadvantages of making a rule based system? Can save doing double the work? Maybe you can provide me with sources and evidence for each side, so that I can make the decision baed on my cirucumstances. Again, I want to know when ruls-based approach is favorable versus when a ML based approach is favorable? Thanks!
Here is an idea:
Instead of going one way or another, you can set up a hybrid model. Look at typical errors your machine learning classifier makes, and see if you can come up with a set of rules that capture those errors. Then run these rules on your input, and if they applied, finish there; if not, pass the input on to the classifier.
In the past I did this with a probabilistic part-of-speech tagger. It's difficult to tune a probabilistic model, but it's easy to add a few pre- or post-processing rules to capture some consistent errors.
https://www.linkedin.com/feed/update/urn:li:activity:6674229787218776064?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6674229787218776064%2C6674239716663156736%29
Yoel Krupnik (CTO & co-founder | smrt - AI For Accounting) writes:
I think it really depends on the specific problem. Some problems can be completely solved with rule based logic, some require machine learning (often in combination with rule based logic before or after).
Advantages of the rule based are that it doesn't require labeled training data, might quickly provide decent results used as a benchmark and helps you better understand the problem for future labeling / text manipulations required by the ML algorithm.

How come a small dataset has a high variance? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Why does a small dataset have a high variance? Our professor once said it. I just did not understand it. Any help would be greatly appreciated.
Thanks in advance.
if your data set is small and you train your model to fit the data set ,it is easy to have overfitting problems.If your data set is big enough,a little overfitting may not a big problem ,but not in a small data set.
Every single one of us, by the time we are entering our professional careers, have been exposed to a larger visual dataset then the largest dataset available for AI researchers. On top of this, we have sound, smell, touch, and taste data all coming in from our external senses. In summary, humans have a lot of context on the human world. We have a general common-sense understanding of human situations. When analyzing a dataset, we combine the data itself with our past knowledge in order to come up with an analysis.
The typical machine learning algorithm has none of that — it has only the data you show to it, and that data must be in a standardized format. If a pattern isn’t present in the data, there is no way for the algorithm to learn it. That's why when given a small dataset it is more prone to error.

Prime numbers identifier with logistic regression [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
is it possible to use logistic regression to identify prime numbers?
i´m trying to project a system with supervised logistic regression with a predefined database numbers and it´s classification (1 = Prime, 0 = Not Prime), using this data i want the computer to use this type of alghorythm to identify other numbers that aren´t classified on DB,
is it possible, or i´m trying to do something impossible?
Given the right network configuration and enough time, I don't know why it would be impossible.
It seems others have had success with different models and you might get a better idea from them:
Early success on prime number testing via artificial networks is presented in A Compositional Neural-network Solution to Prime-number Testing, László Egri, Thomas R. Shultz, 2006. The knowledge-based cascade-correlation (KBCC) network approach showed the most promise, although the practicality of this approach is eclipsed by other prime detection algorithms that usually begin by checking the least significant bit, immediately reducing the search by half, and then searching based other theorems and heuristics up to 𝑓𝑙𝑜𝑜𝑟(𝑥‾‾√). However the work was continued with Knowledge Based Learning with KBCC, Shultz et. al. 2006.

How to write a program that outputs source code [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
This might not be the right place for this to ask, but I am interested in artificial neural networks and want to learn more.
How do you design a network and train it on source code so it can come up with programs for, for example, easy number theory problems?
What's the general name of this research field?
This is a hugely interesting, and very hard, problem area. It will probably take you months to read enough to even understand how to attack the problem. Here's a few things that might help you get started, and they are more to show the problems you will face than to provide solutions:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Then read this, and related papers:
https://arxiv.org/pdf/1410.5401v2.pdf
Next, you probably want to read the classic papers in program synthesis and generation at the parse tree/AST level (mostly out of MIT, I think, in the early 90s.)
Best of luck. This is not trivial.

Authorship Attribution using Machine Learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am working on a practical machine learning problem as an exercise. I just need help formulating my problem.
I have text from 20 books of a famous old Author. there are 5 more books that has been debated throughout history if the belong to the same author or not.
I am thinking about the best way to represent this problem. I am thinking of using a bag-of-words appoach to find the most significant words used by the author.
Should I treat it as a Naive Bayes (Spam/Ham) problem, or should I use KNN classification (Author/non-author) to detect the class of each document. Is there another way of doing it?
I think Naive Bayes can give you insights. One more way can be , find out features which separate such books ex
1. Complexity of words , some writers are easy to understand and use common words , i am hinting towards IDF (Inverse document frequency)
2. Some words may not not even exist at his time like "selfie" , "mobile" etc.
Try to find a lot of features like that and can also train a discriminative classifier.

Resources