How to convert JS MediaStream to numpy array

How to convert JS MediaStream to numpy array - machine-learning

I wanted to integrate a ML project with Nextjs for realtime interaction.
I am using Mediapipe model for real time face detection. One of the crucial step involved in there is
results = model.process(image)
where image is an array of pixel colors of a single frame captured with cv2
and model is a pre-trained MediaPipe Holistic model.
Now on the frontend side of it I can access user's webcam with navigator.mediaDevices and obtain a MediaStream for user's video. I am aware of socketio and webRTC for real time communication but I can't seem to figure out how will I convert my MediaStream to python array.
Also will this be really feasible in real time? I will have to send user stream to backend, let the model calculate result and send the result back to frontend to display.

Related

Is there an API in Arkit that allows me to input static images of faces and get blendshapes data directly?

I am a beginner of Arkit. I have a question for You: Is there an interface or API in Arkit that allows me to input static images of faces and get blendshapes data directly?
I've been scouring the web for blendshapes data, which requires real-time face tracking with a depth camera, but I only want blendshapes data from static images.
Thanks a lot~

Feeding multiple video streams to one ML model using multiprocessing in Python

Currently I have a system of ML models that run in their own processes in Python. It works perfectly when a single video camera feed is input but now I need to feed video from multiple sources and only have resources to run one instance of the model. I tried to batch process multiple video streams but it is not really scalable for more than 5 cameras. Is there any Python framework or pipeline that could be helpful? Please suggest.

Embedding an object detection model into iOS app, and deploy it on a UIView's content instead of camera stream?

I've retrained an ssd_mobilenet_v2 via tensorflow object detection API on my custom class. I've now got a frozen_inference_graph.pb file, which is ready to be embedded into my app.
The tutorials on tensorflow's github and website only show how to use it for the iOS built-in camera stream. Instead, I have an external camera for my iPhone, which streams to an UIView component. I want my network to detect objects in this, but my research doesn't point to any obvious implementations/tutorials.
My question: Does anyone know whether this is possible? If so, what's the best way to implement such a thing? tensorflow-lite? tensorflow mobile? Core ML? Metal?
Thanks!

In that TensorFlow source code, in the file CameraExampleViewController.mm is a method runCNNOnFrame that takes a CVPixelBuffer object as input (from the camera) and copies its contents into image_tensor_mapped.data(). Then it runs the TF graph on that image_tensor object.
To use a different image source, such as the contents of a UIView, you need to first read the contents of that view into some kind of memory buffer (typically a CGImage) and then copy that memory buffer into image_tensor_mapped.data().
It might be easier to convert the TF model to Core ML (if possible), then use the Vision framework to run the model as that can directly use a CGImage as input. This saves you from having to convert that image into a tensor first.

How to do real-time audio convolution in Swift for iOS app?

I am trying to build an iOS app where I have one mono-channel input (real-time from mic) and double-channel impulse response which needs to be real-time convolved with mono channel input and impulse response and will provide an output which is double-channel output (stereo). Is there a way to do that on iOS with Apple's Audio Toolbox?

You should first decide whether you will be doing convolution in the time or frequency domain - there are benefits to both depending on the length of your signal/impulse response. This is somewhere you should do your own research.
For time domain, rolling your own convolution should be straightforward enough. For frequency domain you will be needing a FFT function, you could roll your own but more efficient versions will exist. For example the Accelerate framework has this implemented already.
But for basic I/O Audio Toolbox is a valid choice ..

Displaying Google Tango scans with Hololens

I am currently working with Google Tango and Microsoft Hololens. I got the idea of scanning a room or an object using google Tango and then converting and showing it as hologram with the Hololens.
For that I need to get the ADF file on my computer.
Does someone know of a way to import adf-files onto a computer?
Do you know if it is possible to convert adf-files into usable 3d files?

An ADF is not a 3D scan of the room, it's a collection of feature descriptors from the computer vision algorithms with associated positional data, but the format is not documented.
You will want to use the point cloud from the depth sensor, convert it to a mesh (there are existing apps to do this) and import the mesh into a render engine on Hololens.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to convert JS MediaStream to numpy array - machine-learning

Related

Is there an API in Arkit that allows me to input static images of faces and get blendshapes data directly?

Feeding multiple video streams to one ML model using multiprocessing in Python

Embedding an object detection model into iOS app, and deploy it on a UIView's content instead of camera stream?

How to do real-time audio convolution in Swift for iOS app?

Displaying Google Tango scans with Hololens

Categories

Resources