TensorFlow 1.2.1 and InceptionV3 to classify an image - machine-learning

I'm trying to create an example using the Keras built in the latest version of TensorFlow from Google. This example should be able to classify a classic image of an elephant. The code looks like this:
# Import a few libraries for use later
from PIL import Image as IMG
from tensorflow.contrib.keras.python.keras.preprocessing import image
from tensorflow.contrib.keras.python.keras.applications.inception_v3 import InceptionV3
from tensorflow.contrib.keras.python.keras.applications.inception_v3 import preprocess_input, decode_predictions
# Get a copy of the Inception model
print('Loading Inception V3...\n')
model = InceptionV3(weights='imagenet', include_top=True)
print ('Inception V3 loaded\n')
# Read the elephant JPG
elephant_img = IMG.open('elephant.jpg')
# Convert the elephant to an array
elephant = image.img_to_array(elephant_img)
elephant = preprocess_input(elephant)
elephant_preds = model.predict(elephant)
print ('Predictions: ', decode_predictions(elephant_preds))
Unfortunately I'm getting an error when trying to evaluate the model with model.predict:
ValueError: Error when checking : expected input_1 to have 4 dimensions, but got array with shape (299, 299, 3)
This code is taken from and based on the excellent example coremltools-keras-inception and will be expanded more when it is figured out.

The reason why this error occured is that model always expects the batch of examples - not a single example. This diverge from a common understanding of models as mathematical functions of their inputs. The reasons why model expects batches are:
Models are computationaly designed to work faster on batches in order to speed up training.
There are algorithms which takes into account the batch nature of input (e.g. Batch Normalization or GAN training tricks).
So four dimensions comes from a first dimension which is a sample / batch dimension and then - the next 3 dimensions are image dims.

Actually I found the answer. Even though the documentation states that if the top layer is included the shape of the input vector is still set to take a batch of images. Thus we need to add this before the code line for the prediction:
elephant = numpy.expand_dims(elephant, axis=0)
Then the tensor is in the right shape and everything works correctly. I am still uncertain why the documentation states that the input vector should be (3x299x299) or (299x299x3) when it clearly wants 4 dimensions.
Be careful!

Related

Applying two training data sets to model.fit or combining the results of two Image Generator function for our CNN model

Does anyone know how we can apply two training data sets into the Model.fit section of our CNN model?
I can ask my question in another way, I am applying some augmentation strategies to my images using the Imagedata generator function in Kers to increase the number of my training data. I am wondering if there is a straightforward way that we can combine the results of two Image generator function without saving into directories and then use them in our model?
'''train_batches1 = ImageDataGenerator(rescale=1./255).flow_from_directory(directory="/content/gdrive/Shareddrives/Yihai, Brandon and Mostafa (1)/Images/Cross validation/Fold1/Train",target_size=(64,64),classes=['Normal','OR21_6','OR7_6','OR14_6','OR7_12','OR7_3','OR21_3','OR21_12'],batch_size=10)
train_batches2 = ImageDataGenerator(rescale=1./255,horizontal_flip=True).flow_from_directory(directory="/content/gdrive/Shareddrives/Yihai, Brandon and Mostafa (1)/Images/Cross validation/Fold1/Train",target_size=(64,64),classes=['Normal','OR21_6','OR7_6','OR14_6','OR7_12','OR7_3','OR21_3','OR21_12'],batch_size=10)'''
With best regards,
Mostafa.
You want rescale all the images(repeat dataset twice) and flip half of them
Use only one augmentation process for your data. You can use the imgaug library. it's like Keras sequential models that image flows in it.
import numpy as np
import imgaug as ia
import imgaug.augmenters as iaa
# Define our sequence of augmentation steps that will be applied to every image
# All augmenters with per_channel=0.5 will sample one value _per image_
# in 50% of all cases. In all other cases they will sample new values
# _per channel_.
seq = iaa.Sequential(
[
# apply the following augmenters to most images
iaa.Fliplr(0.5), # horizontally flip 50% of all images
]
You can add any kind of augmentation you need in this sequential.
And set your augmentation function equal to seq and repeat your dataset.

Image Classification with single class dataset using Transfer Learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I only have around 1000 images of computers. I need to train a model that can identify if the image is computer or not-computer. I do not have a dataset for not-computer, as it could be anything.
I guess the best method for this would be to apply transfer learning. I am trying to train data on a pre-trained VGG19 Model. But still, I am unaware on how to train a model with just computers images without any non-computer images.
I am new to ML Overall, so sorry if question is not to the point.
No way, I'm sorry. You'll need a lot (at least other 1000 images) of non-computer images. You can take them from everywhere, the more they "vary" the better is for your model to extract what features characterize a computer.
Imagine to be a baby that is trained to always say "yes" in front of something, next time you'll se something you'll say "yes" no matter what is in front of you...
The same is for machine learning models, you need positive examples and negative examples, or your model will have 100% accuracy by predicting always "yes".
If you want to see it a mathematically/geometrically, you can see each sample (in your case an image) as a point in the feature space: imagine to draw an axis for each attribute you have (x,y,z an so on), an image will be a point in that space.
For simplicity let's consider a 2-dimension space, which means that each image could be described with 2 attributes (not the case for images, usually the features are a lot, but for simplicity imagine feature_1 = number of colors, feature_2 = number of angles), in this example we can simply draw a point in a cartesian graph, one for each image:
The objective of a classifier is to draw a line which better separate the red dots from the blue dots, which means separate positive examples, from negative examples.
If you give the model only positive samples (which is what you were going to do), you'll have infinite models with 100% accuracy! Because you can put a line wherever you want, the only requirement is to not "cut" your dataset.
Given that I suppose you are a beginner, I'll just tell you what to do, not how because it would take years ;)
1) Collect data - as I told you, even negative examples, at least other 1000 samples
2) Split the data into train/test - a good split could be 2/3 of the samples in the training set and 1/3 in the test set. [REMEMBER] Keep consistency of the final class distribution, i.e. if you had 50%-50% of classes "Computer"-"Non computer", you should keep that percentage for both train set and test set
3) Train a model - have a look at this link for a guided examples, it uses the MNIST dataset, which is a famous image classification one, you should use your data
4) Test the model on the test set and look at performance
While it is not impossible to take data belonging to one only one class of data and then use methods to classify whether other data belong to the same class or not, you usually do not end up with too good accuracy that way.
One way to do this, is to use something called "autoencoders". The point here is that you use the same image as input and as the target, and you make sure that the (usually neural network) is forced to compress the image in some way so that it only stores what is important to recreate images of computers. Ideally, this should lead to a model which is good at recreating images of computers, and bad at everything else, meaning you can test how high the loss is on the output, and if it higher than some threshold you've decided on, you deem it to be something else. Again, you're probably not going to get anything close to 90% accuracy doing this, but it is an approach to your problem.
A better approach is to go hunting for models which have been pre-trained on some dataset which had computers as part of the dataset, take the same dataset and set all computers to one class (+ your own images, make sure they adhere to the dataset format) and a selection of the other images to the other class. Make sure to not make the classes too unbalanced, otherwise your model will suffer from it. Extend the pre-trained model with a couple of layer, fully connected should probably do fine, and make the pre-trained part of the model not trainable, so you don't mess up the good weights there when you're practically telling it to ignore everything which is not a computer.
This is probably your best bet, but is going to require a bit more effort on your side in terms of finding all of these parts which you need to make it happen, and to understand how to integrate that code into yours.
You can either use transfer learning using a pretrained model on the imagenet dataset. As mentioned in another answer, there are a bunch of classes inside imagenet close to computers and electronic devices (such as monitors, CD players, laptops, speakers, etc.). So you can fine-tune the model on your dataset and train it to predict computers (train on around 750 images and test on the remaining 250).
You can manually collect images for objects other than computers, preferably a lot of electronic devices (because they are close to computers) and a bunch of other household things (there is a home objects dataset by Caltech). You should collect about 1000 such images to have a class balance. You can train your own custom model once you have this dataset.
No problem!
step one: install a deep-learning toolkit of your choice. they all come with nice tutorials these days.
step two: grab a pre-trained imagenet model. In that model, there are already a few computer classes built into it! ( "desktop_computer", "laptop", 'notebook", and another class for hand-held computers "hand-held_computer")
step three: use model to predict. for this, you'll need to have your images the correct size.
more steps: further fine-tune the model...a bit more advanced but will give you some gains.
Something to think about is what is your goal? accuracy? false positives/negatives, etc? It's always good having a goal of what you need to accomplish from the start.
EDIT: probably the easiest way to get started(if you don't have libraries, gpu, etc) is to go to google colab ( https://colab.research.google.com/notebooks/welcome.ipynb ) and make a notebook in your browser and run the following code.
#some code take and modded from https://www.learnopencv.com/keras-tutorial- using-pre-trained-imagenet-models/
import keras
import numpy as np
from keras.applications import vgg16
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.imagenet_utils import decode_predictions
import matplotlib.pyplot as plt
from PIL import Image
import requests
from io import BytesIO
%matplotlib inline
vgg_model = vgg16.VGG16(weights='imagenet')
def predict_image(image_url, model):
response = requests.get(image_url)
original = Image.open(BytesIO(response.content))
newsize = (224, 224)
original = original.resize(newsize)
# convert the PIL image to a numpy array
# IN PIL - image is in (width, height, channel)
# In Numpy - image is in (height, width, channel)
numpy_image = img_to_array(original)
# Convert the image / images into batch format
# expand_dims will add an extra dimension to the data at a particular axis
# We want the input matrix to the network to be of the form (batchsize, height, width, channels)
# Thus we add the extra dimension to the axis 0.
image_batch = np.expand_dims(numpy_image, axis=0)
plt.imshow(np.uint8(image_batch[0]))
plt.show()
# prepare the image for the VGG model
processed_image = vgg16.preprocess_input(image_batch.copy())
# get the predicted probabilities for each class
predictions = model.predict(processed_image)
# convert the probabilities to class labels
# We will get top 5 predictions which is the default
label = decode_predictions(predictions)
print label[0][0:2] #just display top 2
urls = ['https://4.imimg.com/data4/CO/YS/MY-29352968/samsung-desktop-computer-500x500.jpg', 'https://cdn.britannica.com/77/170477-050-1C747EE3/Laptop-computer.jpg']
for u in urls:
predict_image(u, vgg_model)
This should be a good starting point. Oh, and if the top predicted label is not in the computer, laptop, etc set, then it's NOT a computer!

Train High Definition images with Tensorflow and inception V3 pre trained model

I'm looking to do some image classification on PDF documents that I convert to images. I'm using tensorflow inception v3 pre trained model and trying to retrain the last layer with my own categories following the tensorflow tuto. I have ~1000 training images per category and only 4 categories. With 200k iterations I can reach up to 90% of successful classifications, which is not bad but still need some work:
The issue here is this pre-trained model takes only 300*300p images for input. Obviously it messes up a lot with the characters involved in the features I try to recognize in the documents.
Would it be possible to alter the model input layer so I can give him images with better resolution ?
Would I get better results with a home made and way simpler model ?
If so, where should I start to build a model for such image classification ?
If you want to use a different image resolution than the pre-trained model uses , you should use only the convolution blocks and have a set of fully connected blocks with respect to the new size. Using a higher level library like Keras will make it a lot easier. Below is an example on how to do that in Keras.
import keras
from keras.layers import Flatten,Dense,GlobalAveragePooling2D
from keras.models import Model
from keras.applications.inception_v3 import InceptionV3
base_model = InceptionV3(include_top=False,input_shape=(600,600,3),weights='imagenet')
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024,activation='relu')(x)
#Add as many dense layers / Fully Connected layers required
pred = Dense(10,activation='softmax')(x)
model = Model(base_model.input,pred)
for l in model.layers[:-3]:
l.trainable=False
The input_top = False will give you only the convolution blocks. You can use the input_shape=(600,600,3) to set the required shape you want. And you can add a couple of dense blocks/Fully connected blocks/layers to the model.The last layer should contain the required number of categories .10 represent the number of classes.By this approach you use all the weights associated with the convolution layers of the pre trained model and train only the last dense layers.

How do I obtain the layer names for use in the iOS sample app? (Tensorflow)

I'm very new to Tensorflow, and I'm trying to train something using the inception v3 network for use in an iPhone app. I managed to export my graph as a protocolbuffer file, manually remove the dropout nodes (correctly, I hope), and have placed that .pb file in my iOS project, but now I am receiving the following error:
Running model failed:Not found: FeedInputs: unable to find feed output input
which seems to indicate that my input_layer_name and output_layer_name variables in the iOS app are misconfigured.
I see in various places that it should be Mul and softmax respectively, for inception v3, but these values don't work for me.
My question is: what is a layer (with regards to this context), and how do I find out what mine are?
This is the exact definition of the model that I trained, but I don't see "Mul" or "softmax" present.
This is what I've been able to learn about layers, but it seems to be a different concept, since "Mul" isn't present in that list.
I'm worried that this might be a duplicate of this question but "layers" aren't explained (are they tensors?) and graph.get_operations() seems to be deprecated, or maybe I'm using it wrong.
As MohamedEzz wrote there are no layers in Tensorflow graphs. There are only operations that can be placed under the same name scope.
Usually operations of a single layer placed under the same scope and applications that aware of name scope concept can display them grouped.
One of such applications is Tensorboard. I believe that using Tensorboard is the easiest way to find node names.
Consider the following example:
import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
input_placeholder = tf.placeholder(tf.float32, shape=(None, 224, 224, 3))
network = nets.inception.inception_v3(input_placeholder)
writer = tf.summary.FileWriter('.', tf.get_default_graph())
writer.close()
It creates placeholder for input data then creates Inception v3 network and saves event data (with graph) in current directory.
Launching Tensorflow in the same directory makes it possible to view graph structure.
tensorboard --logdir .
Tensorboard prints UI url to the console
Starting TensorBoard 41 on port 6006
(You can navigate to http://192.168.128.73:6006)
Below is an image of this graph.
Locate node you are interested in and select it to find its name (in the upper left information pane).
Input:
Output:
Please note that usually you need not node names but tensor names. In most cases it is enough to add :0 to node name to get tensor name.
For example to run Inception v3 network created above using names from the graph use the following code (continuation of the above code):
import numpy as np
data = np.random.randn(1, 224, 224, 3) # just random data
session = tf.InteractiveSession()
session.run(tf.global_variables_initializer())
result = session.run('InceptionV3/Predictions/Softmax:0', feed_dict={'Placeholder:0': data})
# result.shape = (1, 1000)
In the core of tensorflow, there are ops (operations) and tensors (n-dimensional arrays). Each op takes tensors and gives back tensors. Layers are just convenience wrappers around a number of ops that represent a neural network layer.
For example a convolution layer is composed of mainly 3 ops :
conv2d op : this is what slides a kernel over the input tensor and does element-wise multiplication between the kernel and the underlying input window.
bias_add op : adds the biases to the tensor coming out of the conv2d op
activation op : applies an activation function element-wise to the output tensor of the bias_add op
To run a tensorflow model, you provide feeds (inputs) and fetches (desired outputs). These are tensors, or tensor names.
From this line of code Inception_model, it seems that what you need is a tensor named 'predictions' which has the n_class output probabilities.
What you observed (softmax) is the type of the op that produced the predictions tensor
As for the input tensor name, the inception_model.py code does not show the input tensor name, since it's an argument to the function. So it depends on what name you have given to that input tensor.
When you create your layers or variable add the parameter called name
with tf.name_scope("output"):
W2 = tf.Variable(tf.truncated_normal([num_filters, num_classes], stddev=0.1), name="W2")
b2 = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b2")
scores = tf.nn.xw_plus_b(h_pool_flat, W2, b2, name="scores")
pred_y = tf.nn.softmax(scores,name="pred_y")
In this case I can access final predicted values by using "output/pred_y". If you dont have name_scope, you can just use "pred_y" to get to the values
conv = tf.nn.conv1d(word_embeddedings,
W1,
stride=stride_size,
padding="VALID",
name="conv") #will have dimensions [batch_size,out_width,num_filters] out_width is a function of max_words,filter_size and stride_size
# Apply nonlinearity
h = tf.nn.relu(tf.nn.bias_add(conv, b1), name="relu")
I called the layer "conv" and used it in the next layer. Paste your snippet like I have done here

Feeding image into tensorflow

I am new to using TensorFlow. So I wastrying the MNIST tutorials in ML for beginners. The code runs just fine. But what if I want to input an image of my own, which has say a handwritten number on it, and se if it predicts what number it might be? How do I feed my own image into the TensorFlow program?
Assuming you're using this file.
If you look at x, the shape is [None, 784]. To feed your own image in, you'll have to store the image as a variable (loading it using PIL or OpenCV or something), flatten it, wrap it in a list, and pass it to the graph in the feed_dict, looking something like this:
sess.run(y_, feed_dict={x: [np.flatten(image_you_loaded_in)]})
It will need to be a 28x28 image in order for this code to work without modification.

Resources