How to predict Total Hours needed with List as Input? - machine-learning

I am struggling with the problem I am facing:
I have a dataset of different products (Cars) that have certain Work Orders open at a given time. I know from historical data how much time this work in TOTAL has caused.
Now I want to predict it for another Car (e.g. Car 3).
Which type of algorithm, regression shall I use for this?
My idea was to transform this row based dataset into column based with binary values e.g. Brake: 0/1, Screen 0/1.. But then I will have lots of Inputs as the number of possible Inputs is 100-200..

Here's a quick idea using multi-factor regression for 30 jobs, each of which is some random accumulation of 6 tasks with a "true cost" for each task. We can regress against the task selections in each job to estimate the cost coefficients that best explain the total job costs.
First done w/ no "noise" in the system (tasks are exact), then with some random noise.
A "more thorough" job would include examining the R-squared value and plotting the residuals to ensure linearity.
In [1]: from sklearn import linear_model
In [2]: import numpy as np
In [3]: jobs = np.random.binomial(1, 0.6, (30, 6))
In [4]: true_costs = np.array([10, 20, 5, 53, 31, 42])
In [5]: jobs
Out[5]:
array([[0, 1, 1, 1, 1, 0],
[1, 0, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 1, 1, 1, 1],
[1, 1, 0, 0, 1, 1],
[0, 1, 0, 0, 1, 0],
[1, 0, 0, 1, 1, 0],
[1, 1, 1, 1, 0, 1],
[1, 0, 0, 1, 0, 1],
[0, 1, 0, 1, 0, 0],
[0, 0, 1, 0, 1, 1],
[1, 0, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1],
[1, 0, 1, 1, 1, 1],
[0, 1, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0],
[1, 1, 1, 1, 1, 1],
[1, 0, 1, 0, 0, 1],
[0, 1, 0, 1, 1, 0],
[1, 1, 1, 0, 1, 0],
[1, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 0, 1],
[0, 0, 0, 1, 1, 1],
[1, 1, 0, 1, 1, 1],
[1, 0, 1, 1, 0, 1],
[1, 1, 1, 1, 1, 1],
[1, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 0, 0],
[1, 1, 0, 0, 1, 1],
[1, 1, 1, 1, 0, 0]])
In [6]: tot_job_costs = jobs # true_costs
In [7]: reg = linear_model.LinearRegression()
In [8]: reg.fit(jobs, tot_job_costs)
Out[8]: LinearRegression()
In [9]: reg.coef_
Out[9]: array([10., 20., 5., 53., 31., 42.])
In [10]: np.random.normal?
In [11]: noise = np.random.normal(0, scale=5, size=30)
In [12]: noisy_costs = tot_job_costs + noise
In [13]: noisy_costs
Out[13]:
array([113.94632664, 103.82109478, 78.73776288, 145.12778089,
104.92931235, 48.14676751, 94.1052639 , 134.64827785,
109.58893129, 67.48897806, 75.70934522, 143.46588308,
143.12160502, 147.71249157, 53.93020167, 44.22848841,
159.64772255, 52.49447057, 102.70555991, 69.08774251,
125.10685342, 45.79436364, 129.81354375, 160.92510393,
108.59837665, 149.1673096 , 135.12600871, 60.55375843,
107.7925208 , 88.16833899])
In [14]: reg.fit(jobs, noisy_costs)
Out[14]: LinearRegression()
In [15]: reg.coef_
Out[15]:
array([12.09045186, 19.0013987 , 3.44981506, 55.21114084, 33.82282467,
40.48642199])
In [16]:

Related

How to zero pad on both sides and encode the sequence into one hot in keras?

I have text data as follows.
X_train_orignal= np.array(['OC(=O)C1=C(Cl)C=CC=C1Cl', 'OC(=O)C1=C(Cl)C=C(Cl)C=C1Cl',
'OC(=O)C1=CC=CC(=C1Cl)Cl', 'OC(=O)C1=CC(=CC=C1Cl)Cl',
'OC1=C(C=C(C=C1)[N+]([O-])=O)[N+]([O-])=O'])
As it is evident that different sequences have different length. How can I zero pad the sequence on both sides of the sequence to some maximum length. And then convert each sequence into one hot encoding based on each characters?
Try:
I used the following keras API but it doesn't work with strings sequence.
keras.preprocessing.sequence.pad_sequences(sequences, maxlen=None, dtype='int32', padding='pre', truncating='pre', value=0.0)
I might need to convert my sequence data into one hot vectors first and then zero pad it. For that I tried to use Tokanizeas follows.
tk = Tokenizer(nb_words=?, split=?)
But then, what should be the split value and nb_words as my sequence data doesn't have any space? How to use it for character based one hot?
MY overall goal is to zero pad my sequences and convert it to one hot before I feed it into RNN.
So i came across a way to do by using Tokenizer first and then pad_sequences to zero pad my sequence in the start as follows.
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer(char_level=True)
tokenizer.fit_on_texts(X_train_orignal)
sequence_of_int = tokenizer.texts_to_sequences(X_train_orignal)
This gives me the output as follows.
[[3, 1, 4, 2, 3, 5, 1, 6, 2, 1, 4, 1, 7, 5, 1, 2, 1, 1, 2, 1, 6, 1, 7],
[3,
1,
4,
2,
3,
5,
1,
6,
2,
1,
4,
1,
7,
5,
1,
2,
1,
4,
1,
7,
5,
1,
2,
1,
6,
1,
7],
[3, 1, 4, 2, 3, 5, 1, 6, 2, 1, 1, 2, 1, 1, 4, 2, 1, 6, 1, 7, 5, 1, 7],
[3, 1, 4, 2, 3, 5, 1, 6, 2, 1, 1, 4, 2, 1, 1, 2, 1, 6, 1, 7, 5, 1, 7],
[3,
1,
6,
2,
1,
4,
1,
2,
1,
4,
1,
2,
1,
6,
5,
8,
10,
11,
9,
4,
8,
3,
12,
9,
5,
2,
3,
5,
8,
10,
11,
9,
4,
8,
3,
12,
9,
5,
2,
3]]
Now I do not understand why it is giving sequence_of_int[1], sequence_of_int[4] output in column format?
After getting the tokens, I applied the pad_sequences as follows.
seq=keras.preprocessing.sequence.pad_sequences(sequence_of_int, maxlen=None, dtype='int32', padding='pre', value=0.0)
and it gives me the output as follows.
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 3, 1, 4, 2, 3, 5, 1, 6, 2, 1, 4, 1, 7, 5, 1,
2, 1, 1, 2, 1, 6, 1, 7],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 4,
2, 3, 5, 1, 6, 2, 1, 4, 1, 7, 5, 1, 2, 1, 4, 1,
7, 5, 1, 2, 1, 6, 1, 7],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 3, 1, 4, 2, 3, 5, 1, 6, 2, 1, 1, 2, 1, 1, 4,
2, 1, 6, 1, 7, 5, 1, 7],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 3, 1, 4, 2, 3, 5, 1, 6, 2, 1, 1, 4, 2, 1, 1,
2, 1, 6, 1, 7, 5, 1, 7],
[ 3, 1, 6, 2, 1, 4, 1, 2, 1, 4, 1, 2, 1, 6, 5, 8,
10, 11, 9, 4, 8, 3, 12, 9, 5, 2, 3, 5, 8, 10, 11, 9,
4, 8, 3, 12, 9, 5, 2, 3]], dtype=int32)
Then after that, I converted it into one hot as follows.
one_hot=keras.utils.to_categorical(seq)

Why is the structuring element asymmetric in OpenCV?

Why is the structuring element asymmetric in OpenCV?
cv2.getStructuringElement(cv2.MORPH_ELLIPSE, ksize=(4,4))
returns
array([[0, 0, 1, 0],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]], dtype=uint8)
Why isn't it
array([[0, 1, 1, 0],
[1, 1, 1, 1],
[1, 1, 1, 1],
[0, 1, 1, 0]], dtype=uint8)
instead?
Odd-sized structuring elements are also asymmetric with respect to 90-degree rotations:
array([[0, 0, 1, 0, 0],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 1, 0, 0]], dtype=uint8)
What's the purpose of that?
There's no purpose for it other than it's one of many possible interpolations for such a shape. In the case of the ellipse with size 5, if it were full it would just be the same as the MORPH_RECT and if the same two were removed from the sides as from the top it would be a diamond. Either way, the way it's actually implemented in the source code is what you would expect---it creates a circle via the distance function and takes near integers to get the binary pixels. Search that file for cv::getStructuringElement and you'll find the implementation, it's nothing too fancy.
If you think an update to this function should be made, then open up a PR on GitHub with the implemented version, or an issue to discuss it first. I think a successful contribution would be easy here and I'd venture that the case for symmetry is strong. One would expect the result of a symmetric image being processed with an elliptical kernel wouldn't depend on orientation of the image.

How would I find the mode (stats) of pixel values of an image?

I'm using opencv and I'm able to get a pixel of an image-- a 3-dimensional tuple, via the code below. However, I'm not quite sure how to calculate the mode of the pixels values in the image.
import cv2
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import cv2
img =cv2.imread('C:\\Users\Moondra\ABEO.png')
#px = img[100,100] #gets pixel value
#print (px)
I tried,
from scipy import stats
stats.mode(img)[0]
But this returns an array shape of
stats.mode(img)[0].shape
(1, 800, 3)
Not sure how exactly stats is calculating the dimensions from which to choose the mode, but I'm looking for each pixel value (3 dimensional tuple) to be one element.
EDIT:
For clarity, I'm going to lay out exactly what I'm looking for.
Let's say we have an array that is of shape (3,5,3) and looks like this
array([[[1, 1, 2], #[1,1,2] = represents the RGB values
[2, 2, 2],
[1, 2, 2],
[2, 1, 1],
[1, 2, 2]],
[[1, 2, 2],
[2, 2, 2],
[2, 2, 2],
[1, 2, 2],
[1, 2, 1]],
[[2, 2, 1],
[2, 2, 1],
[1, 1, 2],
[2, 1, 2],
[1, 1, 2]]])
I would then convert it to an array that looks like this for easier calculation
Turn this into
array([[1, 1, 2],
[2, 2, 2],
[1, 2, 2],
[2, 1, 1],
[1, 2, 2],
[1, 2, 2],
[2, 2, 2],
[2, 2, 2],
[1, 2, 2],
[1, 2, 1],
[2, 2, 1],
[2, 2, 1],
[1, 1, 2],
[2, 1, 2],
[1, 1, 2]])
which is of shape(15,3)
I would like to calculate the mode by counting each set of RGB as follows:
[1,1,2] = 3
[2,2,2] = 4
[1,2,2] = 4
[2,1,1] = 2
[1,1,2] =1
Thank you.
From the description, it seems you are after the pixel that's occurring the most in the input image. To solve for the same, here's one efficient approach using the concept of views -
def get_row_view(a):
void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[-1])))
a = np.ascontiguousarray(a)
return a.reshape(-1, a.shape[-1]).view(void_dt).ravel()
def get_mode(img):
unq, idx, count = np.unique(get_row_view(img), return_index=1, return_counts=1)
return img.reshape(-1,img.shape[-1])[idx[count.argmax()]]
We can also make use of np.unique with its axis argument, like so -
def get_mode(img):
unq,count = np.unique(img.reshape(-1,img.shape[-1]), axis=0, return_counts=True)
return unq[count.argmax()]
Sample run -
In [69]: img = np.random.randint(0,255,(4,5,3))
In [70]: img.reshape(-1,3)[np.random.choice(20,10,replace=0)] = 120
In [71]: img
Out[71]:
array([[[120, 120, 120],
[ 79, 105, 218],
[ 16, 55, 239],
[120, 120, 120],
[239, 95, 209]],
[[241, 18, 221],
[202, 185, 142],
[ 7, 47, 161],
[120, 120, 120],
[120, 120, 120]],
[[120, 120, 120],
[ 62, 41, 157],
[120, 120, 120],
[120, 120, 120],
[120, 120, 120]],
[[120, 120, 120],
[ 0, 107, 34],
[ 9, 83, 183],
[120, 120, 120],
[ 43, 121, 154]]])
In [74]: get_mode(img)
Out[74]: array([120, 120, 120])

Disk structuring element in opencv

I know a disk structuring element can be created in MATLAB as following:
se=strel('disk',4);
0 0 1 1 1 0 0
0 1 1 1 1 1 0
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
0 1 1 1 1 1 0
0 0 1 1 1 0 0
Is there any function or method or any other way of creating the structuring element same as above in opencv. I know we can manually create it using loops but I just want to know if some function exist for that.
The closest one (not the exact same) you can get in OpenCV is by calling getStructuringElement():
int sz = 4;
cv::Mat se = cv::getStructuringElement(cv::MORPH_ELLIPSE, cv::Size(2*sz-1, 2*sz-1));
, which gives the matrix with values
[0, 0, 0, 1, 0, 0, 0;
0, 1, 1, 1, 1, 1, 0;
1, 1, 1, 1, 1, 1, 1;
1, 1, 1, 1, 1, 1, 1;
1, 1, 1, 1, 1, 1, 1;
0, 1, 1, 1, 1, 1, 0;
0, 0, 0, 1, 0, 0, 0]
def estructurant(radius):
kernel = np.zeros((2*radius+1, 2*radius+1) ,np.uint8)
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
mask = x**2 + y**2 <= radius**2
kernel[mask] = 1
kernel[0,radius-1:kernel.shape[1]-radius+1] = 1
kernel[kernel.shape[0]-1,radius-1:kernel.shape[1]-radius+1]= 1
kernel[radius-1:kernel.shape[0]-radius+1,0] = 1
kernel[radius-1:kernel.shape[0]-radius+1,kernel.shape[1]-1] = 1
return kernel
try this
You could also use skimage.morphology.disk, which produces a symmetric result (unlike cv2.getStructuringElement):
>>> disk(4)
array([[0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0]], dtype=uint8)

Undefined method for matrix, string to matrix

I am trying to work with matrixes; I have a model that has an attribute called "board", and its just a 4x4 matrix. I display this board in my view. So far so good. When I click a button, I send the param "board" with, for example, this structure:
{"utf8"=>"✓", "game_master"=>{"board"=>"Matrix[[0, 0, 0, 0], [0, 0, 1, 1], [0, 0, 1, 0], [1, 1, 0, 0]]"}, "commit"=>"Yolo"}
On the other side, in the controller, I try to recreate this board by creating a new gamemaster with board = Matrix[[0, 0, 0, 0], [0, 0, 1, 1], [0, 0, 1, 0], [1, 1, 0, 0]]. So far so good (NOT, I know that the param[:board] is just a string, that's my problem). Then, later on, when trying to iterate the matrix, I get this error:
undefined method `each_with_index' for "Matrix[[0, 0, 0, 0], [0, 0, 1, 1], [0, 0, 1, 0], [1, 1, 0, 0]]":String
Clearly, I bound :board to a string NOT a matrix. How would I go around converting that string into the corresponding matrix?
Thanks
UPDATE:
game_masters_controller.rb
def step
#game_master = GameMaster.new(game_master_params)
#game_master.step
respond_to do |format|
format.js
end
end
And:
private
def game_master_params
params.require(:game_master).permit(:board)
end
game_master.rb
def initialize(attributes = {})
attributes.each do |name, value|
send("#{name}=", value)
end
if(self.board == nil)
self.board = get_new_board
end
end
Simply do:
arr = params[:game_master][:board].split(',').map(&:to_i).each_slice(4).to_a
# => [[0, 0, 0, 0], [0, 0, 1, 1], [0, 0, 1, 0], [0, 1, 0, 0]]
require 'matrix'
matrix = Matrix[*arr]
# => Matrix[[0, 0, 0, 0], [0, 0, 1, 1], [0, 0, 1, 0], [0, 1, 0, 0]]
Quick and dirty and not very secure:
class GameMaster
...
def board=(attr)
#board = eval attr
end
end
I would not run eval on something that gets submitted via a form. If the matrix is always 4x4, I would probably just submit the values in one long comma separated string like 0,0,0,1,1,1,0 .... Then I would use String#split to turn the large string into an array. Once you have one big array you could loop through it to generate an array of arrays that you can send to Matrix.new
string_params = 0,1,1,0,0,1
array_of_string = string_params.split(',')
array_of_arrays = array_of_string.each_slice(4).to_a
matrix = Matrix.new(array_of_arrays)
That should point you in the right direction.
Good luck!
Try this code:
(as the other answers mentioned, it's not secure to eval code coming from an input)
require 'matrix'
m = eval "Matrix[[0, 0, 0, 0], [0, 0, 1, 1], [0, 0, 1, 0], [1, 1, 0, 0]]"
=> Matrix[[0, 0, 0, 0], [0, 0, 1, 1], [0, 0, 1, 0], [1, 1, 0, 0]]
m.transpose
=> Matrix[[0, 0, 0, 1], [0, 0, 0, 1], [0, 1, 1, 0], [0, 1, 0, 0]]
Requiring the matrix.rb file will give you access to a lot of useful methods, check the documentation for further information.
http://ruby-doc.org/stdlib-2.1.0/libdoc/matrix/rdoc/Matrix.html

Resources