Where can I find data pairs to practice implementing linear regression? - machine-learning

I've recently started learning machine learning algorithms. I've written a program in python from scratch to implement linear regression but I need some data pairs to use.

There are many dataset at internet to use,
have a look here, you can find many real datasets: uci

You can use scikit learn, It has some good in build dataset. You can refer this document.

Related

Using Machine Learning for Price Prediction

What Machine Learning Method should i Use to predict Prices like Stocks,gold and etc?
I Prefer using Python but I Can't Find the Starting Point as it Seems so Complicated to me and I've no Clue How to Start it.
Talking about the machine learning method, Regression Method is used for Price prediction as it is used to predict a continuous variable. There are wide range of techniques for regression in machine learning. Starting from simple linear regression, SVR, RandomForest, CatBoost to RNN. Based on target problem, available datasets and computing resources, one of the algorithms can be used.
Yes, Python is the best language to get started into machinbre learning. And definitely, Linear Regression is the best way to start for this regression task if you are new. Gradually, you can start exploring other techniques in scikit-learn before directly jumping into RNN. Scikit-learn is the best machine learning library from beginners to professionals.

How to determine, whether to use machine learning algorithms or data mining technique for a given scenario?

I have been reading so many articles on Machine Learning and Data mining from the past few weeks. Articles like the difference between ML and DM, similarities, etc. etc. But I still have one question, it may look like a silly question,
How to determine, when should we use ML algorithms and when should we use DM?
Because I have performed some practicals of DM using weka on Time Series Analysis(future population prediction, sales prediction), text mining using R/python, etc. Same can be done using ML algorithms also, like future population prediction using Linear regression.
So how to determine, that, for a given problem ML is best suitable or Dm is best suitable.
Thanks in advance.
Probably the closest thing to the quite arbitrary and meaningless separation of ML and DM is unsupervised methods vs. supervised learning.
Choose ML if you have training data for your target function.
Choose DM when you need to explore your data.

Will it be justifiable to use deep learning for 1-D labelled data?

I have been using SVM for training and testing one dimensional data (15000 sample points for training, 7500 sample points for testing) and it has brought up satisfactory results so far. But to improve on the results, I am thinking of using Deep Learning for the same. Will it be able to improve results? What should I study for a quick implementation of Deep Learning algorithms? I am new to the DL field but want a quick implementation, if at all it is justifiable.
In machine learning applications it is hard to say if an algorithm will improve the results or not because the results really depend on the data. There is no best algorithm. You should follow the steps given below:
Analyze your data
Apply the appropriate algorithms by the help of your machine learning background
Evaluate the results
There are many machine learning libraries for different programming languages i.e. Weka for Java and scikit-learn for Python. The implementations may have special names other than the abstract names like Deep Learning. Thus, research for the implementation you are looking for in the library you are using.

training a decision tree

I am trying to get started with Machine Learning. I have some training data representing pixel values of digits in images and I am trying to train a decision tree out of this. What would be a good way of getting started? What tools should I consider (pointers on related documentation would help)? I also want to train a random forest on the data to compare performance versus decision tree. Any guidance would be of great help.
The best way to get started is probably Weka. Apart from offering implementations of a random forest classifier as well as several decision trees (among lots of other algorithms), it also provides tools for processing and visualizing the data. It comes with a relatively easy to use GUI.
The random forest uses trees, so I'd probably counsel you to get the trees working first. Once you know all about trees, you can read about forests and it will be very straightforward. However, you should start by trying to learn about machine learning rather than just jumping into a library. I would start by understanding how to use decision trees on Boolean features (much simpler) using the method of maximizing entropy. Once you understand that algorithm well enough to run it by hand on a small dataset, read up on how to use decision-trees on real valued features. Then check out the library.

How to categorize continuous data?

I have two dependent continuous variables and i want to use their combined values to predict the value of a third binary variable. How do i go about discretizing/categorizing the values? I am not looking for clustering algorithms, i'm specifically interested in obtaining 'meaningful' discrete categories i can subsequently use in in a Bayesian classifier.
Pointers to papers, books, online courses, all very much appreciated!
That is the essence of machine learning and problem one of the most studied problem.
Least-square regression, logistic regression, SVM, random forest are widely used for this type of problem, which is called binary classification.
If your goal is to pragmatically classify your data, several libraries are available, like Scikits-learn in python and weka in java. They have a great documentation.
But if you want to understand what's the intrinsics of machine learning, just search (here or on google) for machine learning resources.
If you wanted to be a real nerd, generate a bunch of different possible discretizations and then train a classifier on it, and then characterize the discretizations by features and then run a classifier on that, and see what sort of discretizations are best!?
In general discretizing stuff is more of an art and having a good understanding of what the input variable ranges mean.

Resources