ML or rule based [closed] - machine-learning

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I already have 85 accuracy on my sklearn text classifier. What are the advantages and disadvantages of making a rule based system? Can save doing double the work? Maybe you can provide me with sources and evidence for each side, so that I can make the decision baed on my cirucumstances. Again, I want to know when ruls-based approach is favorable versus when a ML based approach is favorable? Thanks!

Here is an idea:
Instead of going one way or another, you can set up a hybrid model. Look at typical errors your machine learning classifier makes, and see if you can come up with a set of rules that capture those errors. Then run these rules on your input, and if they applied, finish there; if not, pass the input on to the classifier.
In the past I did this with a probabilistic part-of-speech tagger. It's difficult to tune a probabilistic model, but it's easy to add a few pre- or post-processing rules to capture some consistent errors.

https://www.linkedin.com/feed/update/urn:li:activity:6674229787218776064?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6674229787218776064%2C6674239716663156736%29
Yoel Krupnik (CTO & co-founder | smrt - AI For Accounting) writes:
I think it really depends on the specific problem. Some problems can be completely solved with rule based logic, some require machine learning (often in combination with rule based logic before or after).
Advantages of the rule based are that it doesn't require labeled training data, might quickly provide decent results used as a benchmark and helps you better understand the problem for future labeling / text manipulations required by the ML algorithm.

Related

How to implement feature extraction in Julia [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to make a binary classifier using machine learning and I am trying to develop other features for my data using correlated features (numerical attributes) I have. I searched much but could not get a block of code that will work with me.
What should i do?
I've searched in dimenshionality reduction and found library (Multivariate Statistics) but actually i did not understand and i felt lost :D
No one will make a choice for you what exact method to choose. They are many, many different ways of doing a binary classification and to do feature extraction. If you feel overwhelmed by all these names that libraries such as Multivariate Statistics offer, then take a look at a textbook on statistics and machine learning, understanding the methods is independent from the programming language.
Start with some simple methods such as principal compenent analysis (PCA), (MultivariateStats.jl provides that), then test others as you gain more knowledge on your data and the methods.
Some Julia libraries to take a look at: JuliaStats (https://github.com/JuliaStats) with its parts
StatsBase for the most basic stuff
MultivariateStats for methods like PCA
StatsModels (and DataFrames) for statistical models
many more ....
For Neural Networks there are Flux.jl and KNet.jl
For Clustering there is Clustering.jl
Then, there are also bindings to the python libraries Tensorflow (Neural Networks & more) and Scikit-Learn (all kinds of ML algorithms)
There are many more projects, but these are some that I think are important.

Prime numbers identifier with logistic regression [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
is it possible to use logistic regression to identify prime numbers?
i´m trying to project a system with supervised logistic regression with a predefined database numbers and it´s classification (1 = Prime, 0 = Not Prime), using this data i want the computer to use this type of alghorythm to identify other numbers that aren´t classified on DB,
is it possible, or i´m trying to do something impossible?
Given the right network configuration and enough time, I don't know why it would be impossible.
It seems others have had success with different models and you might get a better idea from them:
Early success on prime number testing via artificial networks is presented in A Compositional Neural-network Solution to Prime-number Testing, László Egri, Thomas R. Shultz, 2006. The knowledge-based cascade-correlation (KBCC) network approach showed the most promise, although the practicality of this approach is eclipsed by other prime detection algorithms that usually begin by checking the least significant bit, immediately reducing the search by half, and then searching based other theorems and heuristics up to 𝑓𝑙𝑜𝑜𝑟(𝑥‾‾√). However the work was continued with Knowledge Based Learning with KBCC, Shultz et. al. 2006.

How to train a model to detect an event in a time line sequence of data [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have a time line of a users data and i want to train a model to detect events.
For example an event could be a gesture in a time line of accelerometer data.
or
time line of looking at the time (looking at a watch), (labeling nerves or calm).
What machine learning algorithm will be appropriate for this problem?
Thanks
This task is known as Event Detection and can be performed using Natural Language Processing (NLP) techniques.
There is no 'appropriate' or 'not appropriate' algorithm. You have to extract various features (e.g. Part-of-Speech tags) that enable the algorithm(s) to detect events. Then, you need to evaluate the implemented algorithms/models (assuming that you have also tuned the corresponding parameters for each algorithm) and decide which one is the best (in terms of performance). Also, you need to decide which features are helpful and which are not.
These papers might be a good starting point:
Machine Learning Algorithms for Event Detection
Event Detection Challenges, Methods, and Applications
in Natural and Artificial Systems
There is no closed answer as to what is the best approach. Based on experience, my favourite approach to modelling series generally is LSTM nets. These work great with time events as long as you have enough data. You can either try to look for anomalies. For this you could use an LSTM that triggers when something 'unexpected' happens. Another option would be defining different states (e.g is.event = {0,1}) and train your LSTM as a normal classifier (check this question in Quora). You can use for example keras to implement this easily in python.
If data in not so abundant, you can also try other nice sequential models like HMM and HSMM. These are also supervised model that learn from sequential data. In the case of HSMM you also take into account the time each state has occur which depending on your data can be of use. As far as I know scikit-learn only supports HMM, however there is a HSMM library available here.
Finally, some remarks about processing your data. If you intend to do batch learning, any of the models here suggested should work fine. However, if you want to do on-line learning (meaning that you make prediction on the fly as data arrives), you will need to stick to LSTM or perhaps check this alternative if you decide to use any of the Bayesian Approach: paper on-line hsmm
Hope this helps!

Learning approach in machine learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
(homework problem)
Which of the following problems are best suited for the learning approach?
Classifying numbers into primes and non-primes.
Detecting potential fraud in credit card charges.
Determining the time it would take a falling object to hit the ground.
Determining the optimal cycle for trafic lights in a busy intersection
I'm trying to answer your question without doing your homework.
Basically you can think of machine learning as a way to extract patterns from data where all other approaches fail.
So first clue here: If there is an analytic way to solve the problem then don't use machine learning! The analytic algorithm will likely be faster, more efficient, and 100% correct.
Second clue is: There has to be a pattern in the data. If you as a human see a pattern, machine learning can find it too. If lots of smart humans who are experts of the respective domain don't see a pattern then machine learning will most likely fail. Chaos can not be learned, i.e. classified/predicted.
That should answer your question. Make sure to also read the summary on wikipedia to get an idea whether a problem can be solved using supervised, unsupervised, or reinforcement learning.

A basic query about data mining [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Using data mining, we are able to find useful patterns in a large set of data using techniques like correlation etc etc and there must exist some open source tools for this (what are some examples?).
Is this pull-based or push-based? I mean, do we provide data set as well as specific queries as input to the data mining engine and it provides us answers (as in SQL) or we only supply large data set as input to the engine and it on its own find patterns (which we never knew existed and/or we couldn't formulate queries for this) and thus we don't really pull any specific queries from it, it pushes the patterns to us.
Some quick reading of Wikipedia article doesn't clarify my doubts in clear way.
As open source have a look at Weka.
In regards to the push-pull thing, well, it's a bit of both. But it's not quite that simple. You must be looking for something. E.g. if you are looking for clusters, there are unsupervised algorithms which will give you an answer with minimal guidance.
In practice things are more meaningful if you know about the data you analyse and you are looking at regularities and patterns that make sense.
Playing with Weka will give you a better idea of the range of possibilities.
Python and R are other great open source tools that have great popularity in the data mining area.
A great tool that i used recently is scikit-learn

Resources