Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I am new to this area.
In my image,
Data mining means to retrieve useful information from data with respect to a data model.
Machine learning seeks to identify behavior patterns in data, and them build various models based on observed patterns.
Also, Data Mining is often considered a sub-field of Machine Learning.
Data Mining usually goes only as far as interpreting the data (e.g. categorizing newspaper articles based on their theme, or books according to the suitable age of readers). It is a part of Machine Learning that is given raw data, and then, using Machine Learning methods, extracts some meaningful information about it.
Machine Learning in general can have more steps than just interpreting the data. Programs developed Machine Learning techniques can also act upon the knowledge "learned" from the data, e.g. a program that is given a bunch of examples of Checkers games and based on that is able to play the game (well), has "learned" from the examples -- the data, and can now interpret new (similar data) and act upon that.
The terms are not overly strict in defintion, but basically I think what you're saying is correct.
Machine learning involves algorithm identification and finessing, whereas data mining implies a more static algorithm that is applied to fixed data. The output of machine learning is information of course, but also new algorithms identified through the process. Data mining seeks to apply a pre-existing algorithm over data.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 months ago.
Improve this question
I want to do a website project that uses machine learning to optimize car throughput in a city. This would be a cartoonish grid of dots attempting to navigate through a grid of streets with stoplights at each intersection. However, I have not been able to find the right resources for learning about this type of ML optimization.
The idea to start is that the grid of stoplights is given the same set of cars each epoch and the stoplights guess their own frequency of green/red to maximize traffic flow. So the metric that the model will learn against is number of cars through the light (or time for all cars to clear the city, not sure yet).
I have done the Google ML Crash Course and the book A Programmer's Guide to Artificial Intelligence, but I have yet to find the right type of ML I am looking for. I am looking for a learning resource on training a model with no labeled data, with a metric for optimization.
Reinforcement learning was what I was looking for and I’m now looking into tensorflow documentation on how a virtual light signal can take actions and receive rewards from a model
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
My question is given a particular dataset and a binary classification task, is there a way we can choose a particular type of model that is likely to work best? e.g. consider the titanic dataset on kaggle here: https://www.kaggle.com/c/titanic. Just by analyzing graphs and plots, are there any general rules of thumb to pick Random Forest vs KNNs vs Neural Nets or do I just need to test them out and then pick the best performing one?
Note: I'm not talking about image data since CNNs are obv best for those.
No, you need to test different models to see how they perform.
The top algorithms based on the papers and kaggle seem to be boosting algorithms, XGBoost, LightGBM, AdaBoost, stack of all of those together, or just Random Forests in general. But there are instances where Logistic Regression can outperform them.
So just try them all. If the dataset is >100k, you're not gonna lose that much time, and you might learn something valuable about your data.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to make a binary classifier using machine learning and I am trying to develop other features for my data using correlated features (numerical attributes) I have. I searched much but could not get a block of code that will work with me.
What should i do?
I've searched in dimenshionality reduction and found library (Multivariate Statistics) but actually i did not understand and i felt lost :D
No one will make a choice for you what exact method to choose. They are many, many different ways of doing a binary classification and to do feature extraction. If you feel overwhelmed by all these names that libraries such as Multivariate Statistics offer, then take a look at a textbook on statistics and machine learning, understanding the methods is independent from the programming language.
Start with some simple methods such as principal compenent analysis (PCA), (MultivariateStats.jl provides that), then test others as you gain more knowledge on your data and the methods.
Some Julia libraries to take a look at: JuliaStats (https://github.com/JuliaStats) with its parts
StatsBase for the most basic stuff
MultivariateStats for methods like PCA
StatsModels (and DataFrames) for statistical models
many more ....
For Neural Networks there are Flux.jl and KNet.jl
For Clustering there is Clustering.jl
Then, there are also bindings to the python libraries Tensorflow (Neural Networks & more) and Scikit-Learn (all kinds of ML algorithms)
There are many more projects, but these are some that I think are important.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have a time line of a users data and i want to train a model to detect events.
For example an event could be a gesture in a time line of accelerometer data.
or
time line of looking at the time (looking at a watch), (labeling nerves or calm).
What machine learning algorithm will be appropriate for this problem?
Thanks
This task is known as Event Detection and can be performed using Natural Language Processing (NLP) techniques.
There is no 'appropriate' or 'not appropriate' algorithm. You have to extract various features (e.g. Part-of-Speech tags) that enable the algorithm(s) to detect events. Then, you need to evaluate the implemented algorithms/models (assuming that you have also tuned the corresponding parameters for each algorithm) and decide which one is the best (in terms of performance). Also, you need to decide which features are helpful and which are not.
These papers might be a good starting point:
Machine Learning Algorithms for Event Detection
Event Detection Challenges, Methods, and Applications
in Natural and Artificial Systems
There is no closed answer as to what is the best approach. Based on experience, my favourite approach to modelling series generally is LSTM nets. These work great with time events as long as you have enough data. You can either try to look for anomalies. For this you could use an LSTM that triggers when something 'unexpected' happens. Another option would be defining different states (e.g is.event = {0,1}) and train your LSTM as a normal classifier (check this question in Quora). You can use for example keras to implement this easily in python.
If data in not so abundant, you can also try other nice sequential models like HMM and HSMM. These are also supervised model that learn from sequential data. In the case of HSMM you also take into account the time each state has occur which depending on your data can be of use. As far as I know scikit-learn only supports HMM, however there is a HSMM library available here.
Finally, some remarks about processing your data. If you intend to do batch learning, any of the models here suggested should work fine. However, if you want to do on-line learning (meaning that you make prediction on the fly as data arrives), you will need to stick to LSTM or perhaps check this alternative if you decide to use any of the Bayesian Approach: paper on-line hsmm
Hope this helps!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
As per Pedro Domingos in his famous paper "A Few Useful Things to Know about Machine Learning" he writes Machine learning systems automatically learn programs from data.
But from my experience we r giving algorithms like ANN or SVM etc.
My question is how it is automating automation?
Could someone put some light with example.
When you develop a machine learning algorithm, with ANN or SVM or whatever, you don't say to your programming how to solve your problem, you are telling him how to learn to solve the problem.
SVM or ANN are ways to learn a solution to a problem, but not how to solve a problem.
So when people say "Machine learning systems automatically learn programs from data", they are saying that you never programmed a solution to your problem, but rather letting the computer learning to do so.
To quote wikipedia : "Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed"
https://en.wikipedia.org/wiki/Machine_learning
[Edit]
For example let's take one of the most simple machine learning algorithm, the linear regression in a 2D space.
The aim of this algorithm is to learn a linear function given a dataset of (x,y), so when you given your system a new x you get an approximation of what the real y would be.
But when you code a linear regression you never specify the linear function y = ax+b. What you code is a way for the program to deduce it from the dataset.
The linear function y=ax+b is the solution to your problem, the linear regression code is the way you are going to learn that solution.
https://en.wikipedia.org/wiki/Linear_regression
Machine Learning development helps to improve business operations as well as improve business scalability. A number of ML algorithms and artificial intelligence tools have gained tremendous popularity in the community of business analytics. There has been a rise in machine learning market due to faster and cheaper computational processing, easy availability of data as well as affordable data storage.