After building the model we save the model to do live predictions. But saving the model will be simple if there is no feature engineering, for instance say I have done some chisquare, Randomforest to get some features which are contributing on model accuracy. But when I save this model the feature used on building this model will be entirely different from the raw data which is passed during training the model.
tnx in advance.
TL DR: You have to run the feature generation pipeline on your unseen data as well before passing through the model.
Long Version: Features are not saved in the model, but the parameters. For e.g. you have 10 different points in the Cartesian plane (x and y coordinates are features) and you transformed them to polar coordinates, say r and theta. Thereafter, you modeled it as a circle. Based upon the transformed features (coordinates in polar space) you calculate the best fitting center C and radius r for the circle. Then you can save the center and radius as the model. The model doesn't have the features saved in it but the parameters C and r. Now given a new point, you will transform it into polar space before using the model for decision making.
So, the feature generation pipeline (transformation to polar space in the above example) along with the model (center and radius) is enough for modeling purposes.
Hope this clarifies the doubt.
Related
I just learned that generative model tries to learn p(x|z)p(z) = p(x,z).
But after I study some sample code of generative models such as VAE and GAN, I found that the output of model is the generated image x, which is a 2D matrix.
In my realization, the content of matrix means the probability of every pixel and the latent variable, is this right?
If it's right, is it possible to get joint probability p(x,z) between latent variables z and a whole image x from generative model?
Thanks!
What a generative model is trying to learn is just p(x). p(x|z) = 1 if g(z) = x and 0 otherwise, because GANs and VAEs are deterministic mappings and therefore have 100% chance to map to the same target given the same input.
Extracting the probability of x is not an easy task though and depends on the approach. With GANs you can approximate this by sampling from the model. E.g. you sample 1000 images and see how often an image occured. Then this image has a probability of occurences / 1000. By the law of large numbers you will eventually recover the actual probability distribution of your generator this way.
If you want an exact way to calculate probabilities you can use FLOW networks like GLOW or RealNVP, which optimize for log(p(x)) directly and have a way to recover p(x).
I am doing multiple regression problem. I have the below data set as below.
rank--discipline--yrs.since.phd--yrs.service--sex--salary
[ 1 1 19 18 1 139750],......
I am taking salary as dependent variable, and other variable as independent variable. After doing data pre processing, I ran the gradient descent, regression model. I estimated bias(intercept), coefficient for all independent features.
I want to do scattered plot for the actual values and regression line
for the hypothesis I predicted. Since we have more than one features here,
I have the below questions.
While plotting actual values (scatted plot), how do I decide the x-axis values. Meaning, I have list of values. for example, first row [1,1,19,18,1]=>139750 How do I transform or map [1,1,19,18,1] to x-axis.? I need to somehow make [1,1,19,18,1] to one value, so I can mark a point of (x,y) in the plot.
While plotting regression line, what would be the feature values, so I can calculate the hypothesis value.?
Meaning now, I have the intercept, and weight of all features, but I dont have the feature values. How do I decide upon the feature values now.?
I want to calculate the points and use matplot to do the jobs. I am aware that there are lot of tools available outside including matplotlib to do the job. But I want to get the basic understanding.
Thanks.
I am still not sure I completely understand your question, so if something is not what you expected comment below and we will work it out.
Now,
Query 1: In all your datasets you are going to have multiple inputs and there is no way to view the target variable salary in your case with respect to all, in a single graph, what is usually done is either you apply the concept of dimensionality reduction on your data using t-sne (link) or you use principal component analysis (PCA) to reduce the dimensionality of your data, and make your output a function of two or three variables and then plot it on the screen, the other technique that I prefer is rather plotting target vs each variable separately as subplot, The reason for this is we don't even have a way to comprehend how we will see the data that is in more than three dimensions.
Query 2: If you are not determined to use matplotlib, I will suggest seaborn.regplot(), but let's also do it in matplotlib. Suppose the variable you want to use first is 'discipline' vs 'salary'.
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
X = df[['discipline']]
Y = df['salary']
lm.fit(X,Y)
After running this lm.coef_ will give you the coefficient, and lm.intercept_ will give you the intercept, in a linear equation that forms this variable, then you can plot the data between two variables and a line using matplotlib easily.
what you can do is ->
from pandas import plotting as pdplt
pdplt.scatter_matrix(dataframe, pass the remaining required parameters)
by this you will get a matrix of plots(in your case it's 6X6) which will exactly show how each column in your dataframe relates to the other columns and you can clearly visualise which feature dominates the result and also how the features are correlated to each other.
If you ask me this is the first thing I used to do with such types of problems and then remove all correlated features and select the features which best approximate the output.
But as you have to plot a 2d plot and in the above approach you might get more than a single feature which dominate the output then what you can do is a miracle named PCA.
If you ask me PCA is one of the most beautiful thing in machine learning. What it will do that is somehow merges all your feautres in some magical ratio which will generate principle components for your data. Principal components are those components which govern/major contribution to your model. You apply pca by simply importing from sklearn and then select the first principle component(as you need a 2d plot) or might select 2 priciple components and plot a 3d graph. But always remember this that these pricipal components are not the real features of your model but they are some magical combination and how PCA did so is very very interesting(by using concepts like eigen values and vectors) and you can build by your own also.
Apart from all these you can apply Singular Value decomposition(SVD) to your model which is the essence of whole linear algebra which is a type of matrix decomposition existing for all matrix. What this do is decompose your matrix into three matrix out of which the diagonal matrix which consists of singular values(a scaling factor) in descending order and what you have to do is that select the top singular values (in your case only the first one having highest magnitude) and construct back a feature matrix from 5 columns to 1 columns and then plot that. You can do svd by using the numpy.linalg
Once you applied any one of these methods then what you can do is learn your hypothesis with only the single most important selected feature and finally plot the graph. But take a tip, just for plotting a 2d graph you should avoid other important features beacuse maybe you have 3 principal components all having almost the same contribution and may the top three singular values are very close to each other. So take my words and take all important features into account and if you need the visualisation of these important features then use scatter matrix
Summary ->
All I want to mention is that you can do the same process with all these things and also can invent your own statistical or mathematical model for compressing your feature space.
But for me I prefer to go with PCA and in such type of problems I even first plot the scatter matrix to get an visual intuition to the data. And also PCA and SVD helps to remove redundancy and hence overfitting.
For rest details refer to docs.
Happy machine learning...
I have a series of tri-axial accelerometer data of dimension (N, 1000, 3), where N is the number of instances, 1000 is the length of the acceleration data (i.e. 10 seconds sampled at 100 Hz) and 3 are the axes X, Y and Z. The data is also divided into two classes, A and B, where A accounts for 95% of the data. In total I have just under 3000 instances of class B. The aim of my project is to create model to detect class B.
I have been creating a number of machine learning models (decision trees, boosted modes etc) with features obtained via signal processing and statsitics (e.g. standard deviation, mean, magnitude, area under curve etc). These models perform well, but they seem to be missing a number of events in the real world, that by eye I can distinguish. This led me to believe that my features are missing key components of the classes. I've been going down into the rabbit hole of signal processing, but to date none has been that Eureka moment.
Now I am no expert in Deep learning, but by combining the data into a single axis (i.e. taking the magnitude) gave promising results (i.e. just as good as the current models). However, again taking the magnitude removes information. So I was wondering if there is a way to use deep learning to 1. select features from the individual axes and 2. use these as input for another deep learner to perform the classification. Something like this:
My simple view of multiple axis deep learner. Here the individual axes (i.e. X, Y and Z) are fed into seperate deep learners and the outputs are then fed into a single deep learner.
Apologies for the lots of text and lack of examples, as I'm not allowed to share the data, and only looking for guidance on whether deep learning can be of help. Thanks for taking the time read my post.
Since there is no specifics in the question, the answer can only be given in general terms.
If magnitude gives good result, you can fed X, Y, Z and magnitude into a single deep learner as 4 input.
In this case, your deep learner will be able to use a) separate features of axis, b) combining the data into a single axis, c) the relationship between the axes.
Can anybody tell me the differences between PCA(Principal component analysis ) , TruncatedSVD(Truncated singular value decomposition) and ICA(Independent component analysis) in detail?
Doing it in detail will require long pages PDF document :-).
But the idea is simple:
Principal Component Analysis (PCA) - analyze the data native coordinates. Namely the coordinated which along the data has its most energy (Variance). For n Samples of dimension d there will be $ d $ orthogonal directions. Namely the data projected on them has no correlation. If we look on the data as Random Variables, it means we found a coordinate system where the Cross Correlation (First Moment) of any pair from the projected data is vanished.
This is a very efficient way to approximate the data in lower dimensionality by keeping most of its energy.
Truncated SVD - One could show that one of the ways of calculating those system of coordinate is using the SVD. Hence this is method to apply the ideas behind PCA.
Independent Component Analysis (ICA) - This is one step farther from PCA. While in PCA we dealt with only First Order Moments of the data (Correlation) in ICA we're looking into higher moments and try to find a projection of data which vanishes higher moments (Think of Lack of Correlation vs. Probability Independence).
Let's suppose I have a noisy 2d data set where one person watching the data could easily draw a straight line in the data so that the mean squared error is minimized.
The model of the line has the form y = mx + b, where x is the input value, y is the predicted value of the model and m and b are trained variables to minimize the cost.
My question is that if we plug some input x1 to the model, it will always output the same number, not taking into account how sparse the data is. How can a model like this predict different values from same inputs?
Maybe this could be done taking all the errors from the model line to the points, making a distribution of them, taking an expected value of such distribution and then adding that value to y?
If the data is 2d, and it can be perfectly modeled with a straight line then there is no data-based nor statistical-based reason not to claim that the process is fully deterministic, and you should output one value.
However, if you have many more dimensions, or your fit is not perfect (error is minimised but not 0) then what you are after is either predicting distribution of values or at least confidence bounds. There are many probabilistic models that can model distribution of the outputs rather than a singe value. In particular linear regression does that, it assumes that you have a Gaussian error around your predictions, thus effectively once you obtain MSE "A" you can draw predictions from N(mx+b, A) - which, as you can easily see degenerates to deterministic model when A=0. These predictions are optimal in expectation, and they are simply your way of "simulating observations" according to the model. There are also meta methods, if you treat your predictor as a black box - you can train multiple models on subsets of data, and treat their predictions as samples to fit a distribution (again for simplicity it could be a single Gaussian).