Design a Data Model to predict value of sum of Function - machine-learning

I am working on a data mining projects and I have to design following model.
I have given 4 feature x1, x2, x3 and x4 and four function defined on these
feature such that each function depend upon some subset of available feature.
e.g.
F1(x1, x2) =x1^2+2x2^2
F2(x2, x3) =2x2^2+3x3^3
F3(x3, x4) =3x3^3+4x4^4
This implies F1 is some function which depend on feature x1, x2. F2 is some feature which depend upon x2, x3 and so on
Now I have a training data set is available where value of x1,x2,x3,x4 is known and sum(F1+F2+F3) { I know total sum but not individual sum of function)
Now using these training data I have to make a model which which can predict total sum of all the function correctly i.e. (F1+F2+F3)
I am new to the data mining and Machine learning field .So I apologizes in advance if this question is too trivial or wrong. I have tried to model it many way but I am not getting any clear thought about it. I will appreciate any help regarding this.

Your problem is non-linear regression. You have features
x1 x2 x3 x4 S where (S = sum(F1+F2+F3) )
You would like to predict S using Xn but S function is non-linear.
Since your function S is non linear, you need to use a non-linear regression algorithm for this problem. Normal nonlinear regression may solve your problem or you may choose other approaches. You may for example try Tree Regression or MARS (Multivariate Adaptive Regression Splines). They are well known algorithms and you can find commercial and open source versions.

Related

Explain feature interaction vs feature correlation

I am confused in the mentioned terminologies in machine learning paradigm? Can anybody drop some kind response here?. I shall be grateful to you..
Feature correlation
means that some feature X1 and X2 are dependent to each other regardless of the target prediction Y. In other words we can say if I increase the value of X1 then X2 would also increase or decrease
For Example: Features (Height(X1), Weight(X2) of a person and prediction variable RunningSpeed (Y) of a person). So if we increase the height then obviously weight will also increase.
Feature Interaction
on the other hand says what our model ouputs a prediction on the basis of independent features X1 and X2. In other words we can say what will be our output if X1 is selected only or what will be output if X2 is selected only or what will be the prediction if combination of both X1 + X2 is selected. This combination defines the interaction among feature. Such combination may be (+,-,*,/).
For example: House size(X1) , House Location (X2) and Price (prediction Y). As we can see X1 and X2 are not correlated to each other but both of them contribute in making prediction of House price.

Why is make_friedman1 used?

I have started to learn ML, and am confused with make_friedman1. It highly improved my accuracy, and increased the data size. But the data isn't the same, it's changed after using this function. What does friedman! actually do?
If make_friedman1 asked here is the one in sklearn.datasets then it is the function which generates the “Friedman #1” regression problem. Here inputs are 10 independent variables uniformly distributed on the interval [0,1], only 5 out of these 10 are actually used. Outputs are created according to the formula::
y = 10 sin(π x1 x2) + 20 (x3 - 0.5)^2 + 10 x4 + 5 x5 + e
where e is N(0,sd)
Quoting from the Friedman's original paper, Multivariate Adaptive Regression Splines ::
A new method is presented for flexible regression modeling of high
dimensional data. The model takes the form of an expansion in product
spline basis functions, where the number of basis functions as well as
the parameters associated with each one (product degree and knot
locations) are automatically determined by the data. This procedure is
motivated by the recursive partitioning approach to regression and
shares its attractive properties. Unlike recursive partitioning,
however, this method produces continuous models with continuous
derivatives. It has more power and flexibility to model relationships
that are nearly additive or involve interactions in at most a few
variables
A spline is adding many polynomial curves end-to-end to make a new smooth curve.

Understanding Polynomial Regression Equation with multiple Independent Variable

Understanding Polynomial Regression.
I understand that we use polynomial regression for some kind of non Linear Data set and to give it a curve. I know the equation of writing a Polynomial Regression for single independent variable but i don't really understand how this equation is constructed for 2 variables?
y = a1 * x1 + a2 * x2 + a3 * x1*x2 + a4 * x1^2 + a5 * x2^2
What will be the equation for Polynomial Regression in case we have 3 or more variables? What actually is the logic behind developing these polynomial equation for more than one variable?
You can choose whatever you want, but the general 'formula' (to the best of my own experience and knowledge) is:
Powers (so x1, x1^2, x1^3 etc) up to whichever number you choose (many stop at 2).
Cross products (x1 * x2, x1 * x3 etc)
Combinations (x1^2 * x2, x1 * x2^2 etc) and then you can even add higher combinations (x1 * x2 * x3, and you can even add powers here).
But this gets quickly out of hand, and you can end up with too many features.
I would stick to powers of 2, and cross products (only pairs) with no powers, quite like your example, and if you have three elements, then the multiplication of all three of them, but if you have more than three, I wouldn't bother with triplets.
The idea with polynomials is that you model the complex relationship between the features and polynomials are sometimes a good approximation to more complex relationships (that are not really polynomial in their nature).
I hope this is what you meant and that this can help you.

Can any machine learning algorithm find this pattern: x1 < x2 without generating a new feature (e.g. x1-x2) first?

If I had 2 features x1 and x2 where I know that the pattern is:
if x1 < x2 then
class1
else
class2
Can any machine learning algorithm find such a pattern? What algorithm would that be?
I know that I could create a third feature x3 = x1-x2. Then feature x3 can easily be used by some machine learning algorithms. For example a decision tree can solve the problem 100% using x3 and just 3 nodes (1 decision and 2 leaf nodes).
But, is it possible to solve this without creating new features? This seems like a problem that should be easily solved 100% if a machine learning algorithm could only find such a pattern.
I tried MLP and SVM with different kernels, including svg kernel and the results are not great. As an example of what I tried, here is the scikit-learn code where the SVM could only get a score of 0.992:
import numpy as np
from sklearn.svm import SVC
# Generate 1000 samples with 2 features with random values
X_train = np.random.rand(1000,2)
# Label each sample. If feature "x1" is less than feature "x2" then label as 1, otherwise label is 0.
y_train = X_train[:,0] < X_train[:,1]
y_train = y_train.astype(int) # convert boolean to 0 and 1
svc = SVC(kernel = "rbf", C = 0.9) # tried all kernels and C values from 0.1 to 1.0
svc.fit(X_train, y_train)
print("SVC score: %f" % svc.score(X_train, y_train))
Output running the code:
SVC score: 0.992000
This is an oversimplification of my problem. The real problem may have hundreds of features and different patterns, not just x1 < x2. However, to start with it would help a lot to know how to solve for this simple pattern.
To understand this, you must go into the settings of all the parameters provided by sklearn, and C in particular. It also helps to understand how the value of C influences the classifier's training procedure.
If you look at the equation in the User Guide for SVC, there are two main parts to the equation - the first part tries to find a small set of weights that solves the problem, and the second part tries to minimize the classification errors.
C is the penalty multiplier associated with misclassifications. If you decrease C, then you reduce the penalty (lower training accuracy but better generalization to test) and vice versa.
Try setting C to 1e+6. You will see that you almost always get 100% accuracy. The classifier has learnt the pattern x1 < x2. But it figures that a 99.2% accuracy is enough when you look at another parameter called tol. This controls how much error is negligible for you and by default it is set to 1e-3. If you reduce the tolerance, you can also expect to get similar results.
In general, I would suggest you to use something like GridSearchCV (link) to find the optimal values of hyper parameters like C as this internally splits the dataset into train and validation. This helps you to ensure that you are not just tweaking the hyperparameters to get a good training accuracy but you are also making sure that the classifier will do well in practice.

complex machine learning application

2 questions concerning machine learning algorithms like linear / logistic regression, ANN, SVM:
The models in these algorithms are dealing with data sets where each example has a no. of features and one output possible value (ex : getting price of house with features f) but what if the features are enough to produce more than one piece of information about the item of interest which means more than one output?! consider this as an example: a data set about cars where each example (car) has the following features (initial velocity, acceleration, and time), in real world these features are enough to know two variables: velocity via v = v_i + at and distance via s = (v_i * t ) + (0.5 * a *t^2 ) so I want example X with features (x1 , x2 , ... , xn) to have output y1 and y2 in the same step so that after training the model, if a new car example is given with initial velocity and acc. and time, the model will be able to predict the velocity and distance at the same time, is this possible?
in the houses' price prediction example where example X given with features (x1, x2, x3) the model predicts the price, can the process be reversed by any means? meaning if I give the model example X with features x1, x2 with price y can it predict the feature x3?
Depends on the model. A linear model such as linear regression cannot reliably learn the distance formula since it's a cubic function of the given variables. You'd need to add v×t and a×t² as a feature to get a good prediction of the distance. A non-linear model such as a cubic-kernel SVM regression or a multi-layer ANN should be able to learn this from the given features, though, given enough data.
More generally, predicting multiple values with a single model sometimes works and sometimes doesn't -- when in doubt, just fit several models.
You can try. Whether it'll work depends on the relation between the variables and the model.

Resources