I am working to understand CoreML. For a starter model, I've downloaded Yahoo's Open NSFW caffemodel. You give it an image, it gives you a probability score (between 0 and 1) that the image contains unsuitable content.
Using coremltools, I've converted the model to a .mlmodel and brought it into my app. It appears in Xcode like so:
In my app, I can successfully pass an image, and the output appears as a MLMultiArray. Where I am having trouble is understanding how to use this MLMultiArray to obtain my probability score. My code is like so:
func testModel(image: CVPixelBuffer) throws {
let model = myModel()
let prediction = try model.prediction(data: image)
let output = prediction.prob // MLMultiArray
print(output[0]) // 0.9992402791976929
print(output[1]) // 0.0007597212097607553
For reference, the CVPixelBuffer is being resized to the required 224x224 that the model asks (I'll get into playing with Vision once I can figure this out).
The two indexes I've printed to the console do change if I provide a different image, but their scores are wildly different than the result I get if I run the model in Python. The same image passed into the model when tested in Python gives me an output of 0.16, whereas my CoreML output, per the example above, is far different (and a dictionary, unlike Python's double output) than what I'm expecting to see.
Is more work necessary to get a result like I am expecting?
It seems like you are not transforming the input image in the same way the model expects.
Most caffe models expects "mean subtracted" images as input, so does this model. If you inspect the python code provided with Yahoo's Open NSFW (classify_nsfw.py):
# Note that the parameters are hard-coded for best results
caffe_transformer = caffe.io.Transformer({'data': nsfw_net.blobs['data'].data.shape})
caffe_transformer.set_transpose('data', (2, 0, 1)) # move image channels to outermost
caffe_transformer.set_mean('data', np.array([104, 117, 123])) # subtract the dataset-mean value in each channel
caffe_transformer.set_raw_scale('data', 255) # rescale from [0, 1] to [0, 255]
caffe_transformer.set_channel_swap('data', (2, 1, 0)) # swap channels from RGB to BGR
Also there is a specific way an image is resized to 256x256 and then cropped to 224x224.
To obtain exactly the same results, you'll need to transform your input image in exactly the same way on both platforms.
See this thread for additional information.
I am trying to do my own object detection using my own dataset. I started my first machine learning program from google tensorflow object detection api, the link is here:eager_few_shot_od_training_tf2_colab.ipynb
In the colab tutorial, the author use javascript label the images, the result like this:
gt_boxes = [
np.array([[0.436, 0.591, 0.629, 0.712]], dtype=np.float32),
np.array([[0.539, 0.583, 0.73, 0.71]], dtype=np.float32),
np.array([[0.464, 0.414, 0.626, 0.548]], dtype=np.float32),
np.array([[0.313, 0.308, 0.648, 0.526]], dtype=np.float32),
np.array([[0.256, 0.444, 0.484, 0.629]], dtype=np.float32)
When I run my own program, I use labelimg replace to javascript, but the dataset is not compatible.
Now I have two questions, the first one is what is the dataset type in colab tutorial? coco, yolo, voc, or any other? the second is how transform dataset between labelimg data and colab tutorial data? My target is using labelimg to label data then substitute in colab tutorial.
The "data type" are just ratio values based on the height and width of the image. So the coordinates are just ratio values for where to start and end the bounding box. Since each image is going to be preprocessed, that is, it's dimensions are changed when fed into the model (batch,height,width,channel) the bounding box coordinates must have the correct ratio as the image might change dimensions from it's original size.
Like for the example, the model expects images to be 640x640. So if you provide an image of 800x600 it has to be resized. Now if the model gave back the coordinates [100,100,150,150] for an 640x640, clearly that would not be the same for 800x600 images.
However, to get this data format you should use PascalVOC when using labelImg.
The typical way to do this is to create TFRecord files and decode them in your training script order to create datasets. However, you are free to choose whatever method you like Tensorflow dataset in order to train your model.
Hope this answered your questions.
I have retrained and fine-tuned Inception_v3 using Keras(2.0.4) & Tensorflow(1.1.0). When I convert the Keras model to MLmodel with coremltools I get a model that requires an input of MultiArray .
That makes sense if I understand that it is asking for [Height, Width, RGB] = (299,299,3). But I don't know how to convert the CVPixelBuffer to that Format.
Can someone please help me understand what preprocessing needs to take place for my re-trained incpetion model to work in coreml? Or what I need to do in the conversion so that it will accept the CVPixelBuffer?
I had retrained InceptionV3 but went back to look at my code. I did not set the input shape to 299,299 in keras. I forced all my photos to be that size in preprocessing. The result was that the Model-JSON did not contain the input dimensions but instead had the values: [null, null, null, 3] and the conversion to CoreML could not know that the input dims were supposed to be 299, 299. I was able to save the model weights, save the json string of the model, edit the json to have the proper inputs [null, 299, 299, 3], load the edited json string as the new model, load the weights, and viola! The coreML model now properly accepts Image
That's a very good question. It seems that pixelbuffer almost always is in BGRA and that's does not crash inception, classes are predicted quite fine, but the thing is that values and vectors are different, I bet that coreml does not convert BGRA to RGB and that channels are in wrong order. I cease to find any way to do that conversion in swift for pixelbuffer, please, let me know if it exists.
I'm trying to create an example using the Keras built in the latest version of TensorFlow from Google. This example should be able to classify a classic image of an elephant. The code looks like this:
# Import a few libraries for use later
from PIL import Image as IMG
from tensorflow.contrib.keras.python.keras.preprocessing import image
from tensorflow.contrib.keras.python.keras.applications.inception_v3 import InceptionV3
from tensorflow.contrib.keras.python.keras.applications.inception_v3 import preprocess_input, decode_predictions
# Get a copy of the Inception model
print('Loading Inception V3...\n')
model = InceptionV3(weights='imagenet', include_top=True)
print ('Inception V3 loaded\n')
# Read the elephant JPG
elephant_img = IMG.open('elephant.jpg')
# Convert the elephant to an array
elephant = image.img_to_array(elephant_img)
elephant = preprocess_input(elephant)
elephant_preds = model.predict(elephant)
print ('Predictions: ', decode_predictions(elephant_preds))
Unfortunately I'm getting an error when trying to evaluate the model with model.predict:
ValueError: Error when checking : expected input_1 to have 4 dimensions, but got array with shape (299, 299, 3)
This code is taken from and based on the excellent example coremltools-keras-inception and will be expanded more when it is figured out.
The reason why this error occured is that model always expects the batch of examples - not a single example. This diverge from a common understanding of models as mathematical functions of their inputs. The reasons why model expects batches are:
Models are computationaly designed to work faster on batches in order to speed up training.
There are algorithms which takes into account the batch nature of input (e.g. Batch Normalization or GAN training tricks).
So four dimensions comes from a first dimension which is a sample / batch dimension and then - the next 3 dimensions are image dims.
Actually I found the answer. Even though the documentation states that if the top layer is included the shape of the input vector is still set to take a batch of images. Thus we need to add this before the code line for the prediction:
elephant = numpy.expand_dims(elephant, axis=0)
Then the tensor is in the right shape and everything works correctly. I am still uncertain why the documentation states that the input vector should be (3x299x299) or (299x299x3) when it clearly wants 4 dimensions.
Be careful!
I would like to generate augmented data for images by Random rotation, shifts, shear and flips.
I have found this keras function.
The function keras.preprocessing.image.ImageDataGenerator But I've seen this being used to directly train networks.
Is there a way to input images and then save the transformed images on HDD instead of how if currently works in examples in this link
Or is there another simple plug and use python package I can use instead of implementing everything with numpy or opencv ?
Basically - this is generator which is infinitely returning a batches of images. One could do the following:
def save_images_from_generator(maximal_nb_of_images, generator):
nb_of_images_processed = 0
for x, _ in generator:
nb_of_images += x.shape[0]
if nb_of_images <= maximal_nb_of_images:
for image_nb in range(x.shape[0]):
your_custom_save(x[image_nb]) # your custom function for saving images
to save images from keras image generator.
You can save the images outputted by ImageGenerator to HDD. One option is to use datagen.flow as follows:
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, save_to_dir='images', save_prefix='aug', save_format='png')
A second option is to manually loop over each image, load it, and apply a random transformation. Once you have instantiated your ImageGenerator, just call:
img_trans = datagen.random_transform(img)
Then, save the transformed image to HDD using PIL etc.
A third option is to manually loop over each image, load it, and apply a random transformation using a third party program. I recommend imgaug, found here.
I have trained an estimator, called clf, using fit method and save the model to disk. The next time to run the program , which will load clf from disk.
my problem is :
how to predict a sample which saved on disk? I mean, how to load it and predict?
how to get the sample label instead of label integer after predict?
how to predict a sample which saved on disk? I mean, how to load it and predict?
You have to use the same array representation for the new samples as the one used for the samples passed to fit method. If you want to predict a single sample, the input must be a 2D numpy array with shape (1, n_features).
The way to read your original file on the HDD and convert it to a numpy array representation suitable for classifier is a domain specific issue: it depends whether you are trying to classify text files, jpeg files, frames in a video file, rows in database, log lines for syslog monitored services...
how to get the sample label instead of label integer after predict?
Just keep a list of label names and ensure that the integer used as target values when fitting are in the range [0, n_classes). For instance ['spam', 'ham'], if you have predictions in the range [0, 1] then you can do:
new_samples = # 2D array with shape (n_samples, n_features)
label_names = ['ham', 'spam']
predictions = [label_names[pred] for pred in clf.predict(new_samples)]