After applying imputation np.nan values is still present - machine-learning

I have used SimpleImputer to change df but empty rows is still present. What did I do wrong?
from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=np.nan,strategy='most_frequent')
imp.fit_transform(df)
msno.matrix(df)
Result

fit_transform is not in place transformation, it returns transformed object
from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=np.nan,strategy='most_frequent')
data_without_nans = imp.fit_transform(df)

Related

Remove digit from MNIST, PyTorch

I'm experimenting with rotating the MNIST digits. Because a 9 is more or less a rotated 6, I'd like to remove all occurrences from the dataset.
As per this answer, I tried
dataset = datasets.MNIST(root='./data')
idx = dataset.train_labels!=9
dataset.train_labels = dataset.train_labels[idx]
dataset.train_data = dataset.train_data[idx]
which fails because the properties of the MNIST class are only readable.
I'd really like to not have to manually iterate through the entire dataset, and create a list of tuples that I feed to my dataloaders. Am I out of luck?
You might proceed as follows, namely by replacing train_labels with targets and train_data with data:
from torchvision import datasets
dataset = datasets.MNIST(root='data')
idx = dataset.targets!=9
dataset.targets = dataset.targets[idx]
dataset.data = dataset.data[idx]
Indeed, as you can see at https://pytorch.org/vision/stable/_modules/torchvision/datasets/mnist.html#MNIST, train_labels and train_data have eventually been marked as properties and as such they can't be set to some values, while targets and data have been probably added as public attributes in the meanwhile.
#property
def train_labels(self):
warnings.warn("train_labels has been renamed targets")
return self.targets
#property
def train_data(self):
warnings.warn("train_data has been renamed data")
return self.data

Concatenate images with same size using vstack

I'm using vstack to concat 2 images but after concatenation, I have a line between the 2 images. I want to know if there is a proper way to remove that line or another way to create a seamless repeat pattern image by concatenation.
import cv2
[concat image][1]import numpy as np
im1 = cv2.imread('test1.jpeg')
y=0
x=0
h=2000
w=2000
im1 = im1[y:y+h, x:x+w]
concat_image
im_v = cv2.vconcat([im1, im1, im1])
im_v2 = cv2.hconcat([im_v,im_v, im_v])
cv2.imwrite('opencv_vconcat.png', im_v2)

Using the LSTM layer in encoder in Pytorch

I want to build an autoencoder with LSTM layers. But, at the first step of the encoder, I got an error. Could you please help me with that?
Here is the model which I tried to build:
import numpy
import torch.nn as nn
r_input = torch.nn.LSTM(1, 1, 28)
activation = nn.functional.relu
mu_r = nn.Linear(22, 6)
log_var_r = nn.Linear(22, 6)
y = np.random.rand(1, 1, 28)
def encode_r(y):
y = torch.reshape(y, (-1, 1, 28)) # torch.Size([batch_size, 1, 28])
hidden = torch.flatten(activation(r_input(y)), start_dim = 1)
z_mu = mu_r(hidden)
z_log_var = log_var_r(hidden)
return z_mu, z_log_var
But I got this error in my code:
RuntimeError: input.size(-1) must be equal to input_size. Expected 1, got 28.
You're not creating the layer in the correct way.
torch.nn.LSTM requires input_size as the first argument, but your tensor has a dimension of 28. It seems that you want the encoder to output a tensor with a dimension of 22. You're also passing the batch as the first dimension, so you need to include batch_first=True as an argument.
r_input = torch.nn.LSTM(28, 22, batch_first=True)
This should work for your specific setup. You should also note that LSTM returns 2 items, the first one is the one you want to use.
hidden = torch.flatten(activation(r_input(y)[0]), start_dim=1)
Please read the declaration on the official wiki for more information.

Why isn't Python OpenCV HoughP Transform able to identify all the spaced lines?

When we have spaced lines on 1px. HoughP transform of python opencv doesn't mark all the points.
I used:
cv2.HoughLinesP(img,1,np.pi/180,400)
Theoretically it should be working fine be it dashed or non dashed. In this case it doesn't mark all the lines if they are on the same height.
HoughP Transfrom Sample Output
The Green Lines indicate the white lines that were identified.
I changed the parameters to this:
cv2.HoughLinesP(img,1,np.pi/180,10,10,10)
And got this output, as you can see the detection is still missing some parts. Its unclear how, for a straight line, a shorter line is marked but not a longer line.
*** After the method suggested!
After method suggested by Robert
Input Image: Input Image
Here is the code:
import numpy as np
import cv2
import time
img=cv2.imread("in.PNG")
img2=np.abs(img)
img=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
ret,thresh1 = cv2.threshold(img,127,255,cv2.THRESH_BINARY)
lines = cv2.HoughLinesP(img,rho = 1,theta = 1*np.pi/180,threshold =
10,minLineLength = 10,maxLineGap = 10)
N = lines.shape[0]
print lines
for i in range(N):
x1 = lines[i][0][0]
y1 = lines[i][0][1]
x2 = lines[i][0][2]
y2 = lines[i][0][3]
cv2.line(img2,(x1,y1),(x2,y2),(0,255,0),1)
#cv2.imshow("Window",thresh1)
cv2.imwrite("out.PNG",img2)

How to import and use scipy.spatial.distance functions correctly?

from scipy.spatial.distance import seuclidean #imports abridged
import scipy
img = np.asarray(Image.open("testtwo.tif").convert('L'))
img = 1 * (img < 127)
area = (img == 0).sum() # computing white pixel area
print area
areasplit = np.split(img, 24) # splitting image array
print areasplit
for i in areasplit:
result = (i == 0).sum()
print result #computing white pixel area for every single array
minimal = result.min()
maximal = result.max()
dist = seuclidian(minimal, maximal)
print dist
I want to compute distances between array elements, produced from splitting an image. Python can`t recognize the name of a distance functions (I have tried several of them and variuos approaches to importing modules). How to import and call these functions correctly? Thank you
You haven't stated what the error is, but you are using numpy as well and I can't see an import for that
Try
import numpy as np
import scipy
Then try
dist = scipy.spatial.distance.euclidian(minimal, maximal)
dists = scipy.spatial.distance.seuclidian(minimal, maximal, variances)
Note - the standardised euclidean distance takes a third parameter.

Resources