I am running the camera iOS example distributed by tensorflow, and it is quite slow: 4-5 seconds per inference on an iPhone6, running the inception5h.zip model.
To my understanding, this is GoogleNet model, which is light-weighted, and the iOS code pulls its first output layer, which is about half of size of the full model. I ran the same model with the python interface on my macbook, which takes 30 ms per inference.
So I am wondering why it is about 150x slower running the same model on iOS than on macbook. Seems I'm doing some obvious things wrong.
This isn't well-documented yet, but you need to pass in optimization flags to the compile script to get a fast version of the library. Here's an example:
tensorflow/contrib/makefile/compile_ios_tensorflow.sh "-Os"
That should bring your speed up a lot, informally I see a second or less with GoogLeNet on a 5S.
Related
i am currently playing around with some generative models, such as stable-diffusion and i was wondering if it is technically possible and actually sensible to fine-tune the model on a Geforce RTX3070 with 8GB VRAM. Its just to play around a bit so small dataset and i dont expect good results out of it, but to my understanding if i turn down the batch size far enough and use lower resolution images it should be technically possible. Or am i missing something because on their repository they say that you need a GPU with at least 24GB.
I did not get to coding yet because i wanted to first check if its even possible before i end up setting everything up and then find out it does not work.
I've read about a person that was able to train using 12GB of ram, using the instructions in this video:
https://www.youtube.com/watch?v=7bVZDeGPv6I&ab_channel=NerdyRodent
It sounds a bit painful though. You would definitely want to try using the
--xformers
and
--lowvram
command line arguments when you startup SD. I would love to hear how this turns out and if you get it working.
I currently do text-to-speech using tacotron2 and hifi-gan. it working well with GPU but after deploying into server and use CPU to run the model, the result is not as good as before.
so my question is : does inference with CPU lower the model accuracy ?
if yes please kindly explain or send me any reference paper or article.
one more thing , I noticed that when running
model.cuda().eval().half()
and save the tacotron2 model , the model size reduce to half and it seem to run find ,so if I use this half-size model , will it lower the accuracy too ?
You want to look into mixed-precision training NVIDIA, Tensorflow.
Machine learning doesn't usually need high precision floating point.
GPUs & Frameworks can take this into account to speed up training.
However, during deployment the model doesn't take this into account.
The deeper your model, the more this may be an issue because the slight differences add up.
I'm new to Core ML, been toying with it starting today. I tried to use a machine learning model called MobileNet SSD in real-time. It works, but it's rather slow. I see people talking about 20-30fps at the least, my MacBook gets to maybe three at max. Not sure where to start looking for what I did wrong, though.
I based my project on https://github.com/vonholst/SSDMobileNet_CoreML (which is for iOS, I translated it to macOS). If I run that with the simulator it's slow there, too.
I also tried using that GitHub project with the iPhone simulator by feeding it the same image over and over again (rather than sampling from the camera), and it still gets stuck at about the same frame rate.
What could cause this?
We are building an iOS app to perform image classification using the TensorFlow library.
Using our machine learning model (91MB, 400 classes) and the TensorFlow 'simple' example, we get memory warnings on any iOS device with 1GB of RAM. 2GB models do not experience any warnings, while < 1GB models completely run out of memory and crash the app.
We are using the latest TensorFlow code from the master branch that includes this iOS memory performance commit, which we thought might help but didn't.
We have also tried setting various GPU options on our TF session object, including set_allow_growth(true) and set_per_process_gpu_memory_fraction().
Our only changes to the TF 'simple' example code is a wanted_width and wanted_height of 299, and an input_mean and input_std of 128.
Has anyone else run into this? Is our model simply too big?
You can use memory mapping, have you tried that? Tensorflow provides documentation. You can also round your weight values to even less decimal places.
I am creating a test harness to test the precision of various algorithms I am creating that use opencv framework in iOS.
As of right now I am more or less looking to understand if either the opencv or just the iphone is the perpetrator in the dropping of frames. And once I find out whom it is, is there a way to measure how often it is dropping frames and when it is dropping frames.
The reason I can tell is due to the set up of my test harness. I have found that the algorithm runs at about 18fps when running in real time on the phone. But I have then created a modified version of processImage() to be able to process a set of frames in png format. With this process I get about 10fps. Granted I am having to convert these frames from png to a Mat type which I know would take a significant amount of time. So I might be wrong.
Any advice on this subject is greatly appreciated. Thank You.
two useful like are first and second about ios and opencv