How can I run MLJ model on GPU? - machine-learning

Is there any way to run MLJLinearmodel on GPU?
Example
using MLJ
X = MLJ.table(rand(100, 10));
y = 2X.x1 - X.x2 + 0.05*rand(100);
#load LinearRegressor pkg=MLJLinearModels verbosity=0;
model = LinearRegressor()
mach = machine(model, X, y);
fit!(mach)
params = fitted_params(mach)
params.coefs # coefficient of the regression with names
params.intercept # intercept
Xnew = MLJ.table(rand(3, 10));
ypred = predict(mach, Xnew)
This is something I took from the documentation.
Please suggest, how may I train this model on GPU.
Thanks in advance!

Related

Image Deconvolution

to practice Wiener deconvolution, I'm trying to perform a simple deconvolution:
def div(img1 ,img2):
res = np.zeros(img2.shape, dtype = 'complex_')
for i in range (img2.shape[0]):
for j in range (img2.shape[0]):
if (np.abs(img2[i][j]) > 0.001):
res[i][j] = 1 / (img2[i][j])
else:
res[i][j] = 0.001
return res
filtre = np.asarray([[1,1,1],
[1,1,1],
[1,1,1]]) * 1/9
filtre_freq = fft2(filtre)
v = signal.convolve(img, filtre)
F = div(1,(filtre_freq))
f = ifft2(F)
res = signal.convolve(v, f)
I am trying to compute the inverse filter in the frequency domain, pass it to the spatial domain and do the convolution with the inverse filter. On paper it's pretty simple, even if I have to manage the divisions by 0 without really knowing how to do it.
But my results seem really inconsistent:
If anyone can enlighten me on this ... Thanks in advance and have a great evening.

Compare results from Julia MLJ models

I'd like to train 3 models in MLJ.jl: ARDRegressor, AdaBoostRegressor, BaggingRegressor
Currently, I train them 1 at a time for example:
using Pkg; Pkg.activate("."); Pkg.instantiate();
using RDatasets, MLJ, Statistics, PrettyPrinting, GLM
X, y = #load_boston; train, test = 1:406, 407:506
#load ARDRegressor
reg = ARDRegressor
m = machine(reg(), X, y);
fit!(m, rows=train);
ŷ = predict(m, rows=test)
os_ARDRegressor = rms(ŷ , y[test])
I'd like to train them w/ a loop such as:
modlist = [ARDRegressor; AdaBoostRegressor; BaggingRegressor]
score = []
for (i, mod) in enumerate(modlist)
#load mod;
reg = mod;
m = machine(reg(), X, y);
fit!(m, rows=train);
ŷ = predict(m, rows=test)
push!( score, (i, mod, rms(ŷ , y[test])) )
end
There are a few issues in running your last code block.
The loop for jj in eachindex(Models) iterates over the indices of the Models array, so jj takes the values 1, 2, 3. Rather loop over the Models array directly.
#load ARDRegressor is a macro invocation; this means #load jj will translate to #load("jj"), so doesn't use jj like the variable you intended.
The value of os_jj will be overwritten on every iteration of the loop. You rather want to keep the score in an array at that index: os[jj] = ...
MLJ requires you to import the packages that contain the models before loading them. Remind yourself that using ScikitLearn requires the sklearn python package to be installed in your current environment.
Consider the following working code example:
using MLJ, RDatasets
X, y = #load_boston; train, test = 1:406, 407:506
models = [#load ARDRegressor; #load AdaBoostRegressor; #load BaggingRegressor]
score = Array{Float64}(undef, 3)
for (i, model) in enumerate(models)
m = machine(model, X, y)
fit!(m, rows=train);
ŷ = predict(m, rows=test)
score[i] = rms(ŷ, y[test])
end
#show score
Side note: the use of using Pkg; Pkg.activate(".") is unnecessary when running julia using this command: julia --project. But this comes down to personal preference

Need a vectorized solution in pytorch

I'm doing an experiment using face images in PyTorch framework. The input x is the given face image of size 5 * 5 (height * width) and there are 192 channels.
Objective: To obtain patches of x of patch_size(given as argument).
I have obtained the required result with the help of two for loops. But I want a better-vectorized solution so that the computation cost will be very less than using two for loops.
Used: PyTorch 0.4.1, (12 GB) Nvidia TitanX GPU.
The following is my implementation using two for loops
def extractpatches( x, patch_size): # x is bsx192x5x5
patches = x.unfold( 2, patch_size , 1).unfold(3,patch_size,1)
bs,c,pi,pj, _, _ = patches.size() #bs,192,
cnt = 0
p = torch.empty((bs,pi*pj,c,patch_size,patch_size)).to(device)
s = torch.empty((bs,pi*pj, c*patch_size*patch_size)).to(device)
//Want a vectorized method instead of two for loops below
for i in range(pi):
for j in range(pj):
p[:,cnt,:,:,:] = patches[:,:,i,j,:,:]
s[:,cnt,:] = p[:,cnt,:,:,:].view(-1,c*patch_size*patch_size)
cnt = cnt+1
return s
Thanks for your help in advance.
I think you can try this as following. I used some parts of your code for my experiment and it worked for me. Here l and f are the lists of tensor patches
l = [patches[:,:,int(i/pi),i%pi,:,:] for i in range(pi * pi)]
f = [l[i].contiguous().view(-1,c*patch_size*patch_size) for i in range(pi * pi)]
You can verify the above code using toy input values.
Thanks.

Linear Regression not optimizing for non linear data

I am new to ML and am trying my hands on Linear regression. I am using this dataset. The data and my "optimized" model look like this:
I am modifying the data like this:
X = np.vstack((np.ones((X.size)),X,X**2))
Y = np.log10 (Y)
#have tried roots of Y and 3 degree feature as well
Intial cost: 0.8086672720475084
Optimized cost: 0.7282965408177141
I am unable to optimize further no matter the no. of runs.
Increasing learning rate causes increase in cost.
My rest algorithm is fine as I am able to optimize for a simpler dataset. Shown Below:
Sorry, If this is something basic but I can't seem to find a way to optimize my model for original data.
EDIT:
Pls have look at my code, I don't why its not working
def GradientDescent(X,Y,theta,alpha):
m = X.shape[1]
h = Predict(X,theta)
gradient = np.dot(X,(h - Y))
gradient.shape = (gradient.size,1)
gradient = gradient/m
theta = theta - alpha*gradient
cost = CostFunction(X,Y,theta)
return theta,cost
def CostFunction(X,Y,theta):
m = X.shape[1]
h = Predict(X,theta)
cost = h - Y
cost = np.sum(np.square(cost))/(2*m)
return cost
def Predict(X,theta):
h = np.transpose(X).dot(theta)
return h
x is 2,333
y is 333,1
I tried debugging it again but I can't find it. Pls help me.

Theano gradient doesn't work with .sum(), only .mean()?

I'm trying to learn theano and decided to implement linear regression (using their Logistic Regression from the tutorial as a template). I'm getting a wierd thing where T.grad doesn't work if my cost function uses .sum(), but does work if my cost function uses .mean(). Code snippet:
(THIS DOESN'T WORK, RESULTS IN A W VECTOR FULL OF NANs):
x = T.matrix('x')
y = T.vector('y')
w = theano.shared(rng.randn(feats), name='w')
b = theano.shared(0., name="b")
# now we do the actual expressions
h = T.dot(x,w) + b # prediction is dot product plus bias
single_error = .5 * ((h - y)**2)
cost = single_error.sum()
gw, gb = T.grad(cost, [w,b])
train = theano.function(inputs=[x,y], outputs=[h, single_error], updates = ((w, w - .1*gw), (b, b - .1*gb)))
predict = theano.function(inputs=[x], outputs=h)
for i in range(training_steps):
pred, err = train(D[0], D[1])
(THIS DOES WORK, PERFECTLY):
x = T.matrix('x')
y = T.vector('y')
w = theano.shared(rng.randn(feats), name='w')
b = theano.shared(0., name="b")
# now we do the actual expressions
h = T.dot(x,w) + b # prediction is dot product plus bias
single_error = .5 * ((h - y)**2)
cost = single_error.mean()
gw, gb = T.grad(cost, [w,b])
train = theano.function(inputs=[x,y], outputs=[h, single_error], updates = ((w, w - .1*gw), (b, b - .1*gb)))
predict = theano.function(inputs=[x], outputs=h)
for i in range(training_steps):
pred, err = train(D[0], D[1])
The only difference is in the cost = single_error.sum() vs single_error.mean(). What I don't understand is that the gradient should be the exact same in both cases (one is just a scaled version of the other). So what gives?
The learning rate (0.1) is way to big. Using mean make it divided by the batch size, so this help. But I'm pretty sure you should make it much smaller. Not just dividing by the batch size (which is equivalent to using mean).
Try a learning rate of 0.001.
Try dividing your gradient descent step size by the number of training examples.

Resources