How to do data exploration before choosing any Machine Learning algorithms [closed] - machine-learning

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Any tools could help recognize the data distribution pattern, and then make the decision to choose ML algorithms?

Firstly, you have to understand Machine Learning as a field, and have some understanding of its sub fields. If you don't intuitively understand your tools, you won't be able to identify when to use them.
The idea you're talking about is called exploratory data analysis, and it can be very approachable if you think about it the right way. Think about it in terms of the scientific method:
First, look over the data, and any documentation about it.
Then, come to some hypotheses about the patterns that might exist.
Based on your understanding of ML, brainstorm some approaches that might give some insight into your hypotheses. For example, if you see that your proposed dependent value can have several distinct values, you have a classification problem, and based on your input data, you should choose an appropriate approach.
The tools that you might find useful are plentiful, but a good start could be the programming language R, or Python. Both are very strong data science tools. R has a greater learning curve, but is built with data science in mind. Python, on the other hand, is very easy to pick up, but you have more choices to make with regards to ML and data science libraries. With Python, look into Pandas for CSV and data manipulation, and Tensorflow, Theano or Scikit-Learn for data analysis and ML.
Hope this helps!

Related

How to implement feature extraction in Julia [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to make a binary classifier using machine learning and I am trying to develop other features for my data using correlated features (numerical attributes) I have. I searched much but could not get a block of code that will work with me.
What should i do?
I've searched in dimenshionality reduction and found library (Multivariate Statistics) but actually i did not understand and i felt lost :D
No one will make a choice for you what exact method to choose. They are many, many different ways of doing a binary classification and to do feature extraction. If you feel overwhelmed by all these names that libraries such as Multivariate Statistics offer, then take a look at a textbook on statistics and machine learning, understanding the methods is independent from the programming language.
Start with some simple methods such as principal compenent analysis (PCA), (MultivariateStats.jl provides that), then test others as you gain more knowledge on your data and the methods.
Some Julia libraries to take a look at: JuliaStats (https://github.com/JuliaStats) with its parts
StatsBase for the most basic stuff
MultivariateStats for methods like PCA
StatsModels (and DataFrames) for statistical models
many more ....
For Neural Networks there are Flux.jl and KNet.jl
For Clustering there is Clustering.jl
Then, there are also bindings to the python libraries Tensorflow (Neural Networks & more) and Scikit-Learn (all kinds of ML algorithms)
There are many more projects, but these are some that I think are important.

Any advice for Beginner Programmer studying Deep Learning? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Thanks for making it this far on my post!
I am studying engineering, yet have a passion for programming and wish to implement computer science knowledge into my own research.
My question is pertaining to any resources that this community has available and any advice you all are willing to give regarding getting started in this broad field.
I’m mainly confused about ‘neural networks’ in relation to Deep Learning as well as implementation of algorithms.
I have slight Python and R knowledge.
Note: one of the subfora of StackExchange is probably a better fit for this question.
In any case, for ML you can do just fine with basic Python/R. Most of the research and work done on ML is based on TensorFlow and similar frameworks currently (2018). To use the frameworks you don't really need a strong programming background to setup and train models on them (although it certainly helps). Actually, math/statistics will help you more, specially if you want to get to the bottom of it (i.e. reading the latest articles/papers, etc.).
Mainly I’m confused about ‘neural networks’ in relation to Deep Learning
"Deep Learning" is basically taking advantage of modern computing capabilities to train complex models (e.g. neural networks with many hidden layers) which a few years ago (e.g. 10 years ago) were unfeasible. Informally speaking, the more complex your network is, the more interesting are the things that it can learn.
as well as implementation of algorithms.
Typically, you will use an existing framework -- you won't implement the algorithms yourself. Although, of course, implementing a MultiLayer Perceptron by yourself is always a good and fun learning exercise.

Machine Learning on financial big data [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Disclaimer: although I know some things about big data and am currently learning some other things about machine learning, the specific area that I wish to study is vague, or at least appears vague to me now. I'll do my best to describe it, but this question could still be categorised as too vague or not really a question. Hopefully, I'll be able to reword it more precisely once I get a reaction.
So,
I have some experience with Hadoop and the Hadoop stack (gained via using CDH), and I'm reading a book about Mahout, which is a collection of machine learning libraries. I also think I know enough statistics to be able to comprehend the math behind the machine learning algorithms, and I have some experience with R.
My ultimate goal is making a setup that would make trading predictions and deal with financial data in real time.
I wonder if there're any materials that I can further read to help me understand ways of managing that problem; books, video tutorials and exercises with example datasets are all welcome.
Take ML course on coursera. It is a good introductery into ML algorithms which will tell you what ML could do\some general approaches:
https://www.coursera.org/course/ml
Also to get a broader picture I suggest coursera's DataSciense course:
https://www.coursera.org/course/datasci
Finally a good book is Mahout in action - it is more about solving practical matters with mahout and has lots of examples and case-studies.
I beleive after that you will have a better understanding of what you want to do next.

Probability basics for machine learning [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have recently started studying Machine Learning and found that I need to refresh probability basics such as Conditional Probability, Bayes Theorem etc.
I am looking for online resources where I can quickly brush up probability concepts wrt Machine Learning.
The online resorces, I stumbled upon are either very basic or too advanced.
This might help: http://www.cs.cmu.edu/~tom/10601_fall2012/lectures.shtml
The above link is from Tom Mitchell's Machine Learning Class # CMU. Videos are available too. You will gain a very good understanding of ML concepts if you go through all the videos. (or just the first few videos for Conditional Probability, Bayes Theorem, etc).
The notion of conditional probability and bayes theorem are very basic themselves. It doesn't get any more basic than that in probabilistic modeling, you might say. Which suggests that you didn't look two well at what you've found or didn't really do any search at all.
Off the top of my head, I can name two resources: first, any Coursera course dealing with probabilities or machine learning (see AI, Statistics One or Probabilistic Graphical Models) contains these preliminaries. Second, there's a number of books on statistics freely available online, one example being Information Theory, Inference, and Learning Algorithms.

A basic query about data mining [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Using data mining, we are able to find useful patterns in a large set of data using techniques like correlation etc etc and there must exist some open source tools for this (what are some examples?).
Is this pull-based or push-based? I mean, do we provide data set as well as specific queries as input to the data mining engine and it provides us answers (as in SQL) or we only supply large data set as input to the engine and it on its own find patterns (which we never knew existed and/or we couldn't formulate queries for this) and thus we don't really pull any specific queries from it, it pushes the patterns to us.
Some quick reading of Wikipedia article doesn't clarify my doubts in clear way.
As open source have a look at Weka.
In regards to the push-pull thing, well, it's a bit of both. But it's not quite that simple. You must be looking for something. E.g. if you are looking for clusters, there are unsupervised algorithms which will give you an answer with minimal guidance.
In practice things are more meaningful if you know about the data you analyse and you are looking at regularities and patterns that make sense.
Playing with Weka will give you a better idea of the range of possibilities.
Python and R are other great open source tools that have great popularity in the data mining area.
A great tool that i used recently is scikit-learn

Resources