IPhone X true depth image analysis and CoreML - ios

I understant that my question is not directly related to programming itself and looks more like research. But probably someone can advise here.
I have an idea for app, when user takes a photo and app will analyze it and cut everythig except required object (a piece of clothin for example) and will save it in a separate image. Yesterday it was very difficult task, because developer should create pretty good neural network and educate it. But after Apple released iPhone X with true depth camera, half of the problems can be solved. As per my understanding, developer can remove background much more easily, because iPhone will know where background is located.
So only several questions left:
I. What is the format of photos which are taken by iPhone X with true depth camera? Is it possible to create neural network that will be able to use information about depth from the picture?
II. I've read about CoreML, tried some examples, but it's still not clear for me - how the following behaviour can be achieved in terms of External Neural Network that was imported into CoreML:
Neural network gets an image as an input data.
NN analyzes it, finds required object on the image.
NN returns not only determinated type of object, but cropped object itself or array of coordinates/pixels of the area that should be cropped.
Application gets all required information from NN and performs necessary actions to crop an image and save it to another file or whatever.
Any advice will be appreciated.

Ok, your question is actually directly related to programming:)
Ad I. The format is HEIF, but you access data of the image (if you develop an iPhone app) by means of iOS APIs, so you easily get information about bitmap as CVPixelBuffer.
Ad II.
1. Neural network gets an image as an input data.
As mentioned above, you want to get your bitmap first, so create a CVPixelBuffer. Check out this post for example. Then you use CoreML API. You want to use MLFeatureProvider protocol. An object which conforms to is where you put your vector data with MLFeatureValue under a key name picked by you (like "pixelData").
import CoreML
class YourImageFeatureProvider: MLFeatureProvider {
let imageFeatureValue: MLFeatureValue
var featureNames: Set<String> = []
init(with imageFeatureValue: MLFeatureValue) {
featureNames.insert("pixelData")
self.imageFeatureValue = imageFeatureValue
}
func featureValue(for featureName: String) -> MLFeatureValue? {
guard featureName == "pixelData" else {
return nil
}
return imageFeatureValue
}
}
Then you use it like this, and feature value will be created with initWithPixelBuffer initializer on MLFeatureValue:
let imageFeatureValue = MLFeatureValue(pixelBuffer: yourPixelBuffer)
let featureProvider = YourImageFeatureProvider(imageFeatureValue: imageFeatureValue)
Remember to crop/scale image before this operation so as to your network is being fed with a vector of a proper size.
NN analyzes it, finds required object on the image.
Use prediction function on your CoreML model.
do {
let outputFeatureProvider = try yourModel.prediction(from: featureProvider)
//success! your output feature provider has your data
} catch {
//your model failed to predict, check the error
}
NN returns not only determinated type of object, but cropped object itself or array of coordinates/pixels of the area that should be cropped.
This depends on your model and whether you imported it correctly. Under the assumption you did, you access output data by checking returned MLFeatureProvider (remember that this is a protocol, so you would have to implement another one similar to what I made for you in step 1, smth like YourOutputFeatureProvider) and there you have a bitmap and rest of the data your NN spits out.
Application gets all required information from NN and performs necessary actions to crop an image and save it to another file or whatever.
Just reverse step 1, so from MLFeatureValue -> CVPixelBuffer -> UIImage. There are plenty of questions on SO about this so I won't repeat answers.
If you are a beginner, don't expect to have results overnight, but the path is here. For an experienced dev I would estimate this work for several hours to get work done (plus model learning time and porting it to CoreML).
Apart from CoreML (maybe you find your model too sophisticated and it won't be able to port it to CoreML) check out Matthjis Hollemans' github (very good resources on different ways of porting models to iOS). He is also around here and knows a lot in the subject.

Related

RL: Self-Play with On-Policy and Off-Policy

I try to implement self play with PPO.
Suppose we have a game with 2 agents. We control one player on each side and get information like observation and reward after each step. As far as I know, you can use the information of the right and left player to generate training data and to optimize the model. But that is only possible for off-policy, isn't it?
Because with on-policy e.g. PPO, you expect that the training data to be generated by the current network version and that is usually not the case during self play?
Thanks!
Exactly, this is also the same reason why you can use experience-replay (Replay BUffers) only for off-policy methods like Q-learning. Using sample steps that were not generated by the current policy violates the mathematical assumptions behind the gradients that are being backpropagated.

How to make your own custom image dataset?

As I am working on my project that is to detect FOD (Foreign Object Debirs) that is found on the runway. FOD include anything like nuts, bolts, screws, locking wires, plastic debris, stones etc. that has the potential to cause damage to the aircraft. Now I have searched on the Internet to find any image dataset but no dataset is available related to FOD. Now my question is kindly guide me that how can I make my own dataset of images that can then be used for training purpose.
Kindly guide me in making image dataset for both classification and detection purposes. And also the data pre-processing that will be required. Thanks and waiting for the reply!
Although the question is a bit vague regarding your requirements and the specs of your machine, I'll try to answer it. You'll need object detection to do your task. There are many models available which you can use like Yolo, SSD, etc..
To create your own dataset, you can follow these steps:
Take lots of images of your objects of interest in various conditions, viewpoints and backgrounds. (Around 2000 per class should be good enough).
Now annotate (or mark) where your object is in the image. If you're using Yolo, make use of Yolo-mark for annotating. There should be other similar tools for SSD and other models.
Now you can begin training.
These steps should get you started or at least point you in the right direction.
You can build your own dataset with this code. I wrote it, and it works correctly.
You need to import the libraries and add your DATADIR.
if __name__ == "__main__":
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path,img))
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
training_data.append([new_array, class_num])
except Exception as e:
pass
for features, label in training_data:
x_train.append(features)
y_train.append(label)
#create pikle
pickle_out = open("x_train.pickle", "wb")
pickle.dump(x_train, pickle_out)
pickle_out.close()
pickle_out = open("y_train.pickle", "wb")
pickle.dump(y_train, pickle_out)
pickle_out.close()
In case if you're starting completely from scratch, you can use "Dataset Directory", available on Play store. The App helps you in creating custom datasets using your mobile. You'll have to sign in to your Google drive such that your dataset is stored in Drive rather on your mobile. Additionally, It also contains Labelling the entity for classification and Regression predictive models.
Currently, the App supports Binary Image Classification and Image Regression.
Hope this Helped!
Download Link :
https://play.google.com/store/apps/details?id=com.applaud.datasetdirectory

Any ideas on why my coreml model created with turicreate isn't working?

Pretty much brand new to ML here. I'm trying to create a hand-detection CoreML model using turicreate.
The dataset I'm using is from https://github.com/aurooj/Hand-Segmentation-in-the-Wild , which provides images of hands from an egocentric perspective, along with masks for the images. I'm following the steps in turicreate's "Data Preparation" (https://github.com/apple/turicreate/blob/master/userguide/object_detection/data-preparation.md) step-by-step to create the SFrame. Checking the contents of the variables throughout this process, there doesn't appear to be anything wrong.
Following data preparation, I follow the steps in the "Introductory Example" section of https://github.com/apple/turicreate/tree/master/userguide/object_detection
I get the hint of an error when turicreate is performing iterations to create the model. There doesn't appear to be any Loss at all, which doesn't seem right.
After the model is created, I try to test it with a test_data portion of the SFrame. The results of these predictions are just empty arrays though, which is obviously not right.
After exporting the model as a CoreML .mlmodel and trying it out in an app, it is unable to recognize anything (not surprisingly).
Me being completely new to model creation, I can't figure out what might be wrong. The dataset seems quite accurate to me. The only changes I made to the dataset were that some of the masks didn't have explicit file extensions (they are PNGs), so I added the .png extension. I also renamed the images to follow turicreate's tutorial formats (i.e. vid4frame025.image.png and vid4frame025.mask.0.png. Again, the SFrame creation process using this data seems correct at each step. I was able to follow the process with turicreate's tutorial dataset (bikes and cars) successfully. Any ideas on what might be going wrong?
I found the problem, and it basically stemmed from my unfamiliarity with Python.
In one part of the Data Preparation section, after creating bounding boxes out of the mask images, each annotation is assigned a 'label' indicating the type of object the annotation is meant to be. My data had a different name format than the tutorial's data, so rather than each annotation having 'label': 'bike', my annotations had 'label': 'vid4frame25`, 'label': 'vid4frame26', etc.
Correcting this such that each annotation has 'label': 'hand' seems to have corrected this (or at least it's creating a legitimate-seeming model so far).

How to implement new .mlmodel in this project

I am trying to create a handwritten digit recogniser that uses a core ml model.
I am taking the code from another similar project:
https://github.com/r4ghu/iOS-CoreML-MNIST
But i need to incorporate my ml model into this project.
This is my model:(Input image is 299x299)
https://github.com/LOLIPOP-INTELLIGENCE/createml_handwritten
My question is what changes are to be made in the similar project so that it incorporates my coreml model
I tried changing the shapes to 299x299 but that gives me an error
In viewDidLoad, you should change the number 28 to 299 in the call to CVPixelBufferCreate(). In the original app, the mlmodel expects a 28x28 image but your model uses 299x299 images.
However, there is something else you need to change as well: replace kCVPixelFormatType_OneComponent8 with kCVPixelFormatType_32BGRA or kCVPixelFormatType_32RGBA. The original model uses grayscale images but yours expects color images.
P.S. Next time include the actual error message in your question. That's an important piece of information for people who are trying to answer. :-)

How do I do binary image classification using ImageJ?

I want to do binary image classification and I thought ImageJ would be a good platform to do it, but alas, I am at a loss. The pseudocode for what I want to do is below. How do I implement this in ImageJ? It does not have to follow the pseudocode exactly, I just hope that gives an idea of what I want to do. (If there is a plugin or a more suitable platform that does this already, can you point me to that?)
For training:
train_directory = get folder from user
train_set = get all image files in train_directory
class_file = get text file from user
// each row of class_file contains image name and a "1" or "-1"
// a "1" indicates that image belongs to the positive class
model = build_model_from_train_set_and_class_file
write_model_to_output_file
// end train
For testing:
test_directory = get folder from user
test_set = get all images in test_directory
if (user wants to)
class_file = get text file from user
// normally for testing one would always know the class of the image
// the if statement is just in case the user does not
model = read_model_file
output = classify_images_as_positive_or_negative
write_output_to_file
Note that there should not be any preprocessing done by the user: no additional set of masks, no drawings, no additional labels, etc. The algorithm must be able to figure out what is common among the training images from the training images alone and build up a model appropriately. Of course, the algorithm can do any preprocessing it wants to, it just cannot rely on that preprocesssing being done already.
I tried using CVIPtools for this but it wants a mask on all of the images to do feature extraction.

Resources