How to make a predictive model using a timeseries data consisted of binary information? - machine-learning

How would you do regression to predict the sate in a future time:
SeriesYear MonthDay State
0 1 2019 12 13 [1, 0, 0, 1, 0, 0]
1 2 2019 12 17 [0, 1, 0, 0, 1, 0]
2 3 2019 12 20 [0, 0, 1, 0, 1, 0]
3 4 2019 12 24 [0, 1, 0, 1, 0, 0]
4 5 2019 12 27 [0, 1, 0, 0, 1, 0]
5 6 2019 12 31 [0, 0, 0, 1, 0, 1]
6 7 2020 1 3 [1, 0, 0, 0, 0, 1]
.
.
. some future date ?
Basically I want to know state in some future time in the form of a binary list?
NOTE:
Every single row has its own unique state that is not the same as any other row.

Related

How to predict Total Hours needed with List as Input?

I am struggling with the problem I am facing:
I have a dataset of different products (Cars) that have certain Work Orders open at a given time. I know from historical data how much time this work in TOTAL has caused.
Now I want to predict it for another Car (e.g. Car 3).
Which type of algorithm, regression shall I use for this?
My idea was to transform this row based dataset into column based with binary values e.g. Brake: 0/1, Screen 0/1.. But then I will have lots of Inputs as the number of possible Inputs is 100-200..
Here's a quick idea using multi-factor regression for 30 jobs, each of which is some random accumulation of 6 tasks with a "true cost" for each task. We can regress against the task selections in each job to estimate the cost coefficients that best explain the total job costs.
First done w/ no "noise" in the system (tasks are exact), then with some random noise.
A "more thorough" job would include examining the R-squared value and plotting the residuals to ensure linearity.
In [1]: from sklearn import linear_model
In [2]: import numpy as np
In [3]: jobs = np.random.binomial(1, 0.6, (30, 6))
In [4]: true_costs = np.array([10, 20, 5, 53, 31, 42])
In [5]: jobs
Out[5]:
array([[0, 1, 1, 1, 1, 0],
[1, 0, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 1, 1, 1, 1],
[1, 1, 0, 0, 1, 1],
[0, 1, 0, 0, 1, 0],
[1, 0, 0, 1, 1, 0],
[1, 1, 1, 1, 0, 1],
[1, 0, 0, 1, 0, 1],
[0, 1, 0, 1, 0, 0],
[0, 0, 1, 0, 1, 1],
[1, 0, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1],
[1, 0, 1, 1, 1, 1],
[0, 1, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0],
[1, 1, 1, 1, 1, 1],
[1, 0, 1, 0, 0, 1],
[0, 1, 0, 1, 1, 0],
[1, 1, 1, 0, 1, 0],
[1, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 0, 1],
[0, 0, 0, 1, 1, 1],
[1, 1, 0, 1, 1, 1],
[1, 0, 1, 1, 0, 1],
[1, 1, 1, 1, 1, 1],
[1, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 0, 0],
[1, 1, 0, 0, 1, 1],
[1, 1, 1, 1, 0, 0]])
In [6]: tot_job_costs = jobs # true_costs
In [7]: reg = linear_model.LinearRegression()
In [8]: reg.fit(jobs, tot_job_costs)
Out[8]: LinearRegression()
In [9]: reg.coef_
Out[9]: array([10., 20., 5., 53., 31., 42.])
In [10]: np.random.normal?
In [11]: noise = np.random.normal(0, scale=5, size=30)
In [12]: noisy_costs = tot_job_costs + noise
In [13]: noisy_costs
Out[13]:
array([113.94632664, 103.82109478, 78.73776288, 145.12778089,
104.92931235, 48.14676751, 94.1052639 , 134.64827785,
109.58893129, 67.48897806, 75.70934522, 143.46588308,
143.12160502, 147.71249157, 53.93020167, 44.22848841,
159.64772255, 52.49447057, 102.70555991, 69.08774251,
125.10685342, 45.79436364, 129.81354375, 160.92510393,
108.59837665, 149.1673096 , 135.12600871, 60.55375843,
107.7925208 , 88.16833899])
In [14]: reg.fit(jobs, noisy_costs)
Out[14]: LinearRegression()
In [15]: reg.coef_
Out[15]:
array([12.09045186, 19.0013987 , 3.44981506, 55.21114084, 33.82282467,
40.48642199])
In [16]:

Yolov5 model not able to train

I'm making a model to detect potholes in an image. I've done everything right or so it seems to me, but I can't train the model for some reason. What might be the problem here?
!python train.py --img 640 --cfg yolov5m.yaml --hyp data/hyps/hyp.scratch-med.yaml --batch 20 --epochs 300 --data data/potholeData.yaml --weights yolov5m.pt --workers 4 --name yolo_pothole_det_m
This is the final line of the code, which outputs the following.
train: weights=yolov5m.pt, cfg=yolov5m.yaml, data=data/potholeData.yaml, hyp=data/hyps/hyp.scratch-med.yaml, epochs=300, batch_size=20, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=4, project=runs/train, name=yolo_pothole_det_m, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: up to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v7.0-23-g5dc1ce4 Python-3.9.13 torch-1.13.0 CPU
hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.1, copy_paste=0.0
ClearML: run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 🚀 in ClearML
Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Overriding model.yaml nc=80 with nc=1
from n params module arguments
0 -1 1 5280 models.common.Conv [3, 48, 6, 2, 2]
1 -1 1 41664 models.common.Conv [48, 96, 3, 2]
2 -1 2 65280 models.common.C3 [96, 96, 2]
3 -1 1 166272 models.common.Conv [96, 192, 3, 2]
4 -1 4 444672 models.common.C3 [192, 192, 4]
5 -1 1 664320 models.common.Conv [192, 384, 3, 2]
6 -1 6 2512896 models.common.C3 [384, 384, 6]
7 -1 1 2655744 models.common.Conv [384, 768, 3, 2]
8 -1 2 4134912 models.common.C3 [768, 768, 2]
9 -1 1 1476864 models.common.SPPF [768, 768, 5]
10 -1 1 295680 models.common.Conv [768, 384, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 2 1182720 models.common.C3 [768, 384, 2, False]
14 -1 1 74112 models.common.Conv [384, 192, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 2 296448 models.common.C3 [384, 192, 2, False]
18 -1 1 332160 models.common.Conv [192, 192, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 2 1035264 models.common.C3 [384, 384, 2, False]
21 -1 1 1327872 models.common.Conv [384, 384, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 2 4134912 models.common.C3 [768, 768, 2, False]
24 [17, 20, 23] 1 24246 models.yolo.Detect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]]
Isn't it supposed to train the model after that? What am I doing wrong for it to stop it right here?
in cmd you can see that it didn't read any images dataset. make sure that your potholedata.yaml file true located. in this file you have to write this code:
train: ../train/images #path to train images
val: ../valid/images #path to valid images
nc: 1 #number of classes
names: ['Weapon'] #name of classes
After this you can run and your train will continue

How to zero pad on both sides and encode the sequence into one hot in keras?

I have text data as follows.
X_train_orignal= np.array(['OC(=O)C1=C(Cl)C=CC=C1Cl', 'OC(=O)C1=C(Cl)C=C(Cl)C=C1Cl',
'OC(=O)C1=CC=CC(=C1Cl)Cl', 'OC(=O)C1=CC(=CC=C1Cl)Cl',
'OC1=C(C=C(C=C1)[N+]([O-])=O)[N+]([O-])=O'])
As it is evident that different sequences have different length. How can I zero pad the sequence on both sides of the sequence to some maximum length. And then convert each sequence into one hot encoding based on each characters?
Try:
I used the following keras API but it doesn't work with strings sequence.
keras.preprocessing.sequence.pad_sequences(sequences, maxlen=None, dtype='int32', padding='pre', truncating='pre', value=0.0)
I might need to convert my sequence data into one hot vectors first and then zero pad it. For that I tried to use Tokanizeas follows.
tk = Tokenizer(nb_words=?, split=?)
But then, what should be the split value and nb_words as my sequence data doesn't have any space? How to use it for character based one hot?
MY overall goal is to zero pad my sequences and convert it to one hot before I feed it into RNN.
So i came across a way to do by using Tokenizer first and then pad_sequences to zero pad my sequence in the start as follows.
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer(char_level=True)
tokenizer.fit_on_texts(X_train_orignal)
sequence_of_int = tokenizer.texts_to_sequences(X_train_orignal)
This gives me the output as follows.
[[3, 1, 4, 2, 3, 5, 1, 6, 2, 1, 4, 1, 7, 5, 1, 2, 1, 1, 2, 1, 6, 1, 7],
[3,
1,
4,
2,
3,
5,
1,
6,
2,
1,
4,
1,
7,
5,
1,
2,
1,
4,
1,
7,
5,
1,
2,
1,
6,
1,
7],
[3, 1, 4, 2, 3, 5, 1, 6, 2, 1, 1, 2, 1, 1, 4, 2, 1, 6, 1, 7, 5, 1, 7],
[3, 1, 4, 2, 3, 5, 1, 6, 2, 1, 1, 4, 2, 1, 1, 2, 1, 6, 1, 7, 5, 1, 7],
[3,
1,
6,
2,
1,
4,
1,
2,
1,
4,
1,
2,
1,
6,
5,
8,
10,
11,
9,
4,
8,
3,
12,
9,
5,
2,
3,
5,
8,
10,
11,
9,
4,
8,
3,
12,
9,
5,
2,
3]]
Now I do not understand why it is giving sequence_of_int[1], sequence_of_int[4] output in column format?
After getting the tokens, I applied the pad_sequences as follows.
seq=keras.preprocessing.sequence.pad_sequences(sequence_of_int, maxlen=None, dtype='int32', padding='pre', value=0.0)
and it gives me the output as follows.
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 3, 1, 4, 2, 3, 5, 1, 6, 2, 1, 4, 1, 7, 5, 1,
2, 1, 1, 2, 1, 6, 1, 7],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 4,
2, 3, 5, 1, 6, 2, 1, 4, 1, 7, 5, 1, 2, 1, 4, 1,
7, 5, 1, 2, 1, 6, 1, 7],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 3, 1, 4, 2, 3, 5, 1, 6, 2, 1, 1, 2, 1, 1, 4,
2, 1, 6, 1, 7, 5, 1, 7],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 3, 1, 4, 2, 3, 5, 1, 6, 2, 1, 1, 4, 2, 1, 1,
2, 1, 6, 1, 7, 5, 1, 7],
[ 3, 1, 6, 2, 1, 4, 1, 2, 1, 4, 1, 2, 1, 6, 5, 8,
10, 11, 9, 4, 8, 3, 12, 9, 5, 2, 3, 5, 8, 10, 11, 9,
4, 8, 3, 12, 9, 5, 2, 3]], dtype=int32)
Then after that, I converted it into one hot as follows.
one_hot=keras.utils.to_categorical(seq)

Finding the nullity and nullspace in Maxima

I was trying to get the nullity and kernel of a matrix over the complex field in Maxima.
I get strange results, though.
I can define a matrix A:
M : matrix([0, 1, 1, 0], [-1, 0, 0, 1], [0, 0, 0, 1], [0, 0, -1, 0]);
A : M + %i * ident(4);
... for reference, it looks like this:
%i 1 1 0
-1 %i 0 1
0 0 %i 1
0 0 -1 %i
If I then compute the nullity with nullity(A), I get 3.
If I compute the rank with rank(A), I also get 3.
And if I compute the nullspace with nullspace(A), I get:
span([-1, %i, 0, 0], [-%i, -1, 0, 0], [2%i, 2, 0, 0])
But this is pretty weird, because -%i * second(...) is [-1, %i, 0, 0], which is the first vector.
And indeed, when I do NullSpace[{{i, 1, 1, 0}, {-1, i, 0, 1}, {0, 0, i, 1}, {0, 0, -1, i}}] in Mathematica, I get that the nullspace has basis [%i, 1, 0, 0] and is 1-dimensional (not 3-dimensional).
What am I doing wrong?
You are doing everything right, as far as I can tell. The problem is a bug in Maxima, which I have reported: https://sourceforge.net/p/maxima/bugs/3158/
I don't see any simple way to work around it. I am working on fixing the bug.

Disk structuring element in opencv

I know a disk structuring element can be created in MATLAB as following:
se=strel('disk',4);
0 0 1 1 1 0 0
0 1 1 1 1 1 0
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
0 1 1 1 1 1 0
0 0 1 1 1 0 0
Is there any function or method or any other way of creating the structuring element same as above in opencv. I know we can manually create it using loops but I just want to know if some function exist for that.
The closest one (not the exact same) you can get in OpenCV is by calling getStructuringElement():
int sz = 4;
cv::Mat se = cv::getStructuringElement(cv::MORPH_ELLIPSE, cv::Size(2*sz-1, 2*sz-1));
, which gives the matrix with values
[0, 0, 0, 1, 0, 0, 0;
0, 1, 1, 1, 1, 1, 0;
1, 1, 1, 1, 1, 1, 1;
1, 1, 1, 1, 1, 1, 1;
1, 1, 1, 1, 1, 1, 1;
0, 1, 1, 1, 1, 1, 0;
0, 0, 0, 1, 0, 0, 0]
def estructurant(radius):
kernel = np.zeros((2*radius+1, 2*radius+1) ,np.uint8)
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
mask = x**2 + y**2 <= radius**2
kernel[mask] = 1
kernel[0,radius-1:kernel.shape[1]-radius+1] = 1
kernel[kernel.shape[0]-1,radius-1:kernel.shape[1]-radius+1]= 1
kernel[radius-1:kernel.shape[0]-radius+1,0] = 1
kernel[radius-1:kernel.shape[0]-radius+1,kernel.shape[1]-1] = 1
return kernel
try this
You could also use skimage.morphology.disk, which produces a symmetric result (unlike cv2.getStructuringElement):
>>> disk(4)
array([[0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0]], dtype=uint8)

Resources