I've been trying to implement some code where, given an image with some depth data, it specifically returns the part of the image that is focused/closest to the camera. If it's a person, then their face, if it's a plant then the branch. Effectively I'm trying to get the part of the image which would be focused on if the image was taken using Portrait mode on the camera app.
I've been reading this documentation but I've not been able to think of a way to manipulate the data here. My guess would be to use embedsDepthDataInPhoto and then use the depth data in some way to get rid of all other parts of the data if they are a certain distance away or greater from the camera. I'm quite new to this so any help would be greatly appreciated
Related
I am currently playing a bit with ARKit. My goal is to detect a shelf and draw stuff onto it.
I did already find the ARReferenceImage and that basically works for a very, very simple prototype, but the image needs to be quite complex it seems? Xcode always complains if I try to use something a lot simpler (like a QR-Code like image). With that marker I would know the position of an edge and then I'd know the physical size of my shelf and know how to place stuff into it. So that would be ok, but I think small and simple markers will not work, right?
But ideally I would not need a marker at all.
I know that I can detect e.g. planes, but I want to detect the shelf itself. But as my shelf is open, it's not really a plane. Are there other possibilities to find an object using ARKit?
I know that my question is very vague, but maybe somebody could point me in the right direction. Or tell me if that's even possible with ARKit or if I need other tools? Like Unity?
There are several different possibilities for positioning content in augmented reality. They are called content anchors, and they are all subclasses of the ARAnchor class.
Image anchor
Using an image anchor, you would stick your reference image on a pre-determined spot on the shelf and position your 3D content relative to it.
the image needs to be quite complex it seems? Xcode always complains if I try to use something a lot simpler (like a QR-Code like image)
That's correct. The image needs to have enough visual detail for ARKit to track it. Something like a simple black and white checkerboard pattern doesn't work very well. A complex image does.
Object anchor
Using object anchors, you scan the shape of a 3D object ahead of time and bundle this data file with your app. When a user uses the app, ARKit will try to recognise this object and if it does, you can position your 3D content relative to it. Apple has some sample code for this if you want to try it out quickly.
Manually creating an anchor
Another option would be to enable ARKit plane detection, and have the user tap a point on the horizontal shelf. Then you perform a raycast to get the 3D coordinate of this point.
You can create an ARAnchor object using this coordinate, and add it to the ARSession.
Then you can again position your content relative to the anchor.
You could also implement a drag gesture to let the user fine-tune the position along the shelf's plane.
Conclusion
Which one of these placement options is best for you depends on the use case of your app. I hope this answer was useful :)
References
There are a lot of informative WWDC videos about ARKit. You could start off by watching this one: https://developer.apple.com/videos/play/wwdc2018/610
It is absolutely possible. If you do this in swift or Unity depends entirely on what you are comfortable working in.
Arkit calls them https://developer.apple.com/documentation/arkit/arobjectanchor. In other implementations they are often called mesh or model targets.
This Youtube video shows what you want to do in swift.
But objects like a shelf might be hard to recognize since their content often changes.
I'm currently an MS student in Medical Physics and I have a great need to be able to overlay an isodose distribution from an RTDOSE file onto a CT image from a .dcm file set.
I've managed to extract the image and the dose pixel arrays myself using pydicom and dicom_numpy, but the two arrays are not the same size! So, if I overlay the two together, the dose will not be in the correct position based on what the Elekta Gamma Plan software exported it as.
I've played around with dicompyler and 3DSlicer and they obviously are able to do this even though the arrays are not the same size. However, I think I cannot export the numerical data when using these softwares.I can only scroll through and view it as an image. How can I overlay the RTDOSE to an CT image?
Thank you
for what you want it sounds like you should use Simple ITK (or equivalent - my experience is with sitk) to do the dicom handling, not pydicom.
Dicom has built in a complete system for 3D point and location specifications for all the pixel data in patient coordinates. This uses a bunch of attributes in the dicom files in the Image Plane Module set of tags. See here for a good overview.
The simple ITK library fully understands and uses the full 3D Image Plane tags to identify and locate any images in patient coordinates by default - irrespective of such things as the specific pixel spacing, slice thickness etc etc.
So - in your case - if you use SITK to open your studies, then you should be able to overlay them correctly "out of the box", because SITK will do all the work to parse the Image Plane Module tags and locate the data in patient coordinates - just like you get with 3DSlicer.
Pydicom, in contrast, doesn't itself try to use any of that information at all. It only gives you the raw pixel arrays (for images).
Note I use both pydicom and SITK. This isn't something bad about pydicom, but more a question of right tool for the job. In fact, for many (most?) things I use pydicom, but for any true 3D type work, SITK is the easier toolkit to use.
I can use Scikit-Learn to train a model and recognize objects but I also need to be able to tell where in my test data images the object is residing. Is there someway I could maybe get the coordinates of the part of the test image which has the object I'm trying to recognize?
If not, please refer me to some other library that'll help me achieve this task.
Thankyou
I assume that you are talking about a computer vision application. Usually, the way that a box is drawn around an identified object is by using a sliding window and running your classifier on each window as it steps across the screen. You can keep track of which windows come back with positive results and use those windows as your bounds. You may wish to use windows of various size, if the object scale changes from image to image. In that case, you would likely want to prefer the smaller of two overlapping windows.
I want to track the relative position of a camera aimed at a computer screen.
I can’t control what is displayed on the computer screen but I can receive screen dumps whenever something changes on the screen. Those screen dumps can hopefully be used to find the screen when analyzing the video from the camera.
I see many videos on youtube for face, logo or single colored objects tracking using OpenCV but I’m unsure those methods would work finding and tracking a more detailed image like a screen dump.
Maybe Template Matching is the way to go? But I need to find the screen even at an angle.
Basically I don’t know where to begin and need help from people with experience in this field to find the best way for achieving what I want.
Thanks
Using feature matching should do the trick (Sift/SURF/ORB/...)
iam working in a project that i take a vedio by a camera and convert this vedio to frames (this part of project is done )
what iam facing now is how to detect moving object in these frames and differentiate them from the background so that i can distinguish between them ?
I recently read an awesome CodeProject article about this. It discusses several approaches to the problem and then walks you step by step through one of the solutions, with complete code. It's written at a very accessible level and should be enough to get you started.
One simple way to do this (if little noise is present, I recommend smoothing kernel thought) is to compute the absolute difference of two consecutive frames. You'll get an image of things that have "moved". The background needs to be pretty static in order to work. If you always get the abs diff from the current frame to the nth frame you'll have a grayscale image with the object that moved. The object has to be different from the background color or it will disappear...