I have tried to take only the 200 first samples from MNIST using pytorch.
And how to generateonly 3 and 8 samples from MNIST
Someone can help me get them?
I tried:
def get_data(batch_size = 100):
transform = transforms.Compose([transforms.ToTensor()])
all_train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
You can use torch.utils.data.Subset:
from torch.utils.data import Subset
data = Subset(all_train_dataset, range(200))
This way you will get out a torch.utils.data.Dataset that can passed onto a torch.utils.data.DataLoader.
Related
Update 1
I’m thinking that it might be the mistake in my detector code.
So, here is my code for using the trained learner/model to predict images.
import requests
import cv2
bytes = b''
stream = requests.get(url, stream=True)
bytes = bytes + stream.raw.read(1024) # I have my mobile video streaming to this url. the resolution for the video streaming is: 2048 x 1080
a = bytes.find(b'\xff\xd8')
b = bytes.find(b'\xff\xd9')
if a != -1 and b != -1:
jpg = bytes[a:b+2]
bytes = bytes[b+2:]
img = cv2.imdecode(np.fromstring(jpg, dtype=np.uint8), cv2.IMREAD_COLOR)
processedImg = Image(pil2tensor(img, np.float32).div_(255))
predict = learn.predict(processedImg)
self.objectClass = predict[0].obj
and I read the document of imdecode() method, it returns image in B G R order.
Could it because of different channel data used when in training and detecting?
Or
Could it because that I trained with image size 299 x 450, but when detecting the input image size from the video streaming is 2048 x 1080 without resizing it?
new to FastAi, ML and Python. I trained my “Birds Or Not-Birds” model. The train_loss, valid_loss and error_rate were improving. If I only trained 3 epochs, then the model worked(meaning it can recognize whether there are birds or no birds in images), then I increased to 30 epochs, all metrics look very good, but the model does not recognize things anymore, whatever images I input, the model always return Not-Birds.
here is the training output:
Here are the plots of learn.recorder
Here is my code:
from fastai.vision import *
from fastai.metrics import error_rate
from fastai.callbacks import EarlyStoppingCallback,SaveModelCallback
from datetime import datetime as dt
from functools import partial
path_img = '/minidata'
train_folder = 'train'
valid_folder = 'validation'
tunedTransform = partial(get_transforms, max_zoom=1.5)
data = ImageDataBunch.from_folder(path=path_img, train=train_folder, valid=valid_folder, ds_tfms=tunedTransform(),
size=(299, 450), bs=40, classes=['birds', 'others'],
resize_method=ResizeMethod.SQUISH)
data = data.normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet50, metrics=error_rate)
learn.fit_one_cycle(30, max_lr=slice(5e-5,5e-4))
learn.recorder.plot_lr()
learn.recorder.plot()
learn.recorder.plot_losses()
Here is my dataset folder structure:
minidata
train
birds (7500 images)
others (around 7300 images)
validation
birds (1008 images)
others (around 872 images)
Your learning rate schedule is sub-optimal for this dataset. Try to first figure out the best learning rate for this network and dataset with
LRFinder. This can be done by exploring the loss behavior for different learning rates with
learn.lr_find()
learn.recorder.plot()
Edit:
It looks like you are re-training the last layer in your network. Instead try training more layers from scratch. as:
learn.unfreeze(2)
My dataset was restaurants review with two columns review and liked.
Based on the review it shows if they liked the restaurant or not
I cleaned up the data in NLP as the first step.Then as second step used bag of words model as below.
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 1500)
X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, 1].values
This above gave X as 1500 columns with 0 and 1 with 1000 rows according to my dataset.
I predicted as below
y_pred = classifier.predict(X_test)
So now I have review as "Food was good",how do I predict if they like it or not.A single value to predict.
Please can you help me out.Please let me know if additional information is required.
Thanks
All you need is to apply cv.transform first just like so:
>>> test = ['Food was good']
>>> test_vec = cv.transform(test)
>>> classifier.predict(test_vec)
# returns predicted class
For training and testing here is simple example:
Training:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
text = ["This is good place","Hyatt is awesome hotel"]
count_vect = CountVectorizer()
tfidf_transformer = TfidfTransformer()
X_train_counts = count_vect.fit_transform(text)
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
pd.DataFrame(X_train_tfidf.todense(), columns = count_vect.get_feature_names())
# Now apply any classification u want to on top of this data-set
Now Testing:
Note: use the same transformation as done in training:
new = ["I like the ambiance of this hotel "]
pd.DataFrame(tfidf_transformer.transform(count_vect.transform(new)).todense(),
columns = count_vect.get_feature_names())
Apply model.predict on top of this now.
you can also use sklearn pipeline.
from sklearn.pipeline import Pipeline
model_pipeline = Pipeline([('vect', CountVectorizer()),('tfidf', TfidfTransformer()), ('model', classifier())]) #call the Model which you want to use
model_pipeline.fit_transform(x,y) # here x is your text data, and y is going to be your target
model_pipeline.predict(['Food was good"']) # predict your new sentence
I'm using Jena Climate Data that my book gives a link to. I have it below;
https://s3.amazonaws.com/keras-datasets/jena_climate_2009_2016.csv.zip
I tried messing with it but I have no clue why the index is surpassing 200000. I'm not sure why it gets to 200005 since my training data is 200001 observations long.
I've also gotten an error that said, " Index 200000 is out of bounds for axis 0 with size 200000."
The data is 420551x14 of weather data. My code is as follows:
import pandas as pd
import numpy as np
import keras
data = pd.read_csv("D:\\School\\Spring_2019\\GraduateProject\\jena_climate_2009_2016_Data\\jena_climate_2009_2016.csv")
data = data.iloc[:,data.columns!='Date Time']
data
# Standardize the Data
from sklearn import preprocessing
data = preprocessing.scale(data[:200000])
# Build Generators
from keras.preprocessing.sequence import TimeseriesGenerator
target = data[:,1] # Should target be scaled?
# ? Do I need to remove targets from the data variable?
trainGen = TimeseriesGenerator(data,targets=target,length=1440,
sampling_rate=6,
batch_size=190,
start_index=0,
end_index=200000)
valGen = TimeseriesGenerator(data,targets=target,length=1440,
sampling_rate=6,
batch_size=190,
start_index=199999,
end_index=300000)
testGen = TimeseriesGenerator(data,targets=target,length=6,
batch_size=128,
start_index=300000,
end_index=420550)
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
from keras.layers import LSTM
#Flatten part is: 240 = lookback//step. This is 1440/6 because we are looking at
model = Sequential()
model.add(layers.Flatten(input_shape=(240,data.shape[-1])))
model.add(layers.Dense(32,activation='relu'))
model.add(layers.Dense(1))
val_steps = 300000-200001-1440
model.compile(optimizer=RMSprop(),loss='mae')
history = model.fit_generator(trainGen,
steps_per_epoch=250,
epochs=20,
validation_data=valGen,
validation_steps=val_steps)
Let me know if you need anything else and thank you greatly in advance.
Well, you've only selected first 200000 rows for your data (data = preprocessing.scale(data[:200000]), so validation and test generators are out of bounds (index > 200000)
I have 1320 training samples (sea surface temperature) and each sample is a 2d array(160,320) so the final array is in the shape (1320,160,320). I would like to normalize them to values between 0 and 1 using MinMaxScaler(). I get the error "Found array with dim 3. MinMaxScaler expected <= 2.". My code is as follows. I could loop through all the 1320 samples, normalising them one by one but I would like to know if there is a way to normalize all of them because Max and Mix for each sample is not the same.
scaler = prep.MinMaxScaler()
sst = scaler.fit_transform(sst)
As far as I know, you can't really do it only using MinMaxScaler(). np.apply_along_axis won't be useful either since you want to apply a min-max scaler over 2D slices. One solution could be something like this:
import numpy as np
a = np.random.random((2, 3, 3))
def customMinMaxScaler(X):
return (X - X.min()) / (X.max() - X.min())
np.array([customMinMaxScaler(x) for x in a])
But I guess it wouldn't be much faster than iterating over the samples.
I'm fitting my keras model on a sample of images and their corresponding binary masks for object detection. Basically, I'm followig the example at the end of this page:
from keras.preprocessing.image import ImageDataGenerator
# we create two instances with the same arguments
data_gen_args = dict(
rotation_range=4.,
width_shift_range=0.05,
height_shift_range=0.05,
shear_range=0.05,
zoom_range=0.05,
horizontal_flip=True, fill_mode='nearest')
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
seed = 2019
Now create generators for images and masks:
target_size = (180, 320)
small_target_size = (11,20)
batch_size = 8
image_generator_trn = image_datagen.flow_from_directory(
path+'train',
class_mode=None,
target_size = target_size,
batch_size = batch_size,
shuffle= False,
seed=seed)
mask_generator_trn = mask_datagen.flow_from_directory(
path+'mask/train',
class_mode=None,
target_size = small_target_size,
batch_size = batch_size,
shuffle= False,
seed=seed)
Outpu:
Found 3327 images belonging to 2 classes.
Found 3327 images belonging to 2 classes.
Finally we create a generator to be used in model.fit_generator:
train_generator = zip(image_generator_trn, mask_generator_trn)
My problem is with the last line (zipping); i either get memory exception or it doesn't finish execution. I suspect it's trying to zip 2 infinite loops, and tried zipping lazy-ly in model.fit_generator but same issue.
What can i do differently?
The problem lies in that zip tries to exhause both of the generators when they are designed to produce outputs infinitely. This is the reason behind this behaviour. In order to overcome this issue use itertools.izip function. Moreover - please notice that if you don't set the same seed for both generators - different augmentations would be applied to your x and y images. You need to either turn off random augmentation or set the same seed.