I want to make an image viewer for large images (up to 2000 x 8000 pixels) with very responsive zooming and panning, say 30 FPS or more. One option I came up with is to create a 3D scene with the image as a sort of fixed billboard, then move the camera forward/back to zoom and up/down/left/right to pan.
That basic idea seems reasonable to me, but I don't have experience with 3D graphics. Is there some fundamental thing I'm missing that will make my idea difficult or impossible? What might cause problems or be challenging to implement well? What part of this design will limit the maximum image size? Any guesses as to what framerate I might achieve?
I also welcome any guidance or suggestions on how to approach this task for someone brand new to Direct3D.
That seems pretty doable to me, 30 fps even seems quite low, you can certainly achieve a solid 60 (minimum)
One image at 8k*2k resolution is about 100 megs of VRAM (with mipmaps), so with today's graphics cards it's not much of an issue, you'll of course face challenges if you need to load several at the same time.
DirectX 11 supports 16k*16k size textures, so for maximum size you should be sorted.
If you want to just show your image flat your should not even need any 3d transformations, 2d scaling/translations will do just fine.
Related
I've written an app with an object detection model and process images when an object is detected. The problem I'm running into is when an object is detected with 99% confidence but the frame I'm processing is very blurry.
I've considered analyzing the frame and attempting to detect blurriness or detecting device movement and not analyzing frames when the device is moving a lot.
Do you have any other suggestions to only process un-blurry photos or solutions other than the ones I've proposed? Thanks
You might have issues detecting "movement" when for instance driving in car. In that case looking at something inside your car is not considered as movement while looking at something outside is (if it's not far away anyway). There can be many other cases for this.
I would start by checking if camera is in focus. It is not the same as checking if frame is blurry but it might be very close.
The other option I can think of is simply checking 2 or more sequential frames and see if they are relatively the same. To do something like that it is bast to define a grid for instance 16x16 on which you evaluate similar values. You would need to mipmap your photos which manually means resizing it by half till you get to 16x16 image (2000x1500 would become 1024x1024 -> 512x512 -> 256x256 ...). Then grab those 16x16 pixels and store them. Once you have enough frames (at least 2) you can start comparing these values. GPU is perfect for resizing but those 16x16 values are probably best evaluated on the CPU. What you need to do is basically find an average pixel difference in 2 sequential 16x16 buffers. Then use that to evaluate if detection should be enabled.
This procedure may still not be perfect but it should be relatively feasible from performance perspective. There may be some shortcuts as some tools maybe already do resizing so that you don't need to "halve" them manually. From theoretical perspective you are creating sectors and compute their average color. If all the sectors have almost same color between 2 or more frames there is a high chance the camera did not move in that time much and the image should not be blurry from movement. Still if camera is not in focus you can have multiple sequential frames that are exactly the same but in fact they are all blurry. Same happens if you detect phone movement.
I'm currently making an app for an assessment which generates a maze using Recursive Backtracking. It's a 25x25 grid maze, with each wall being a separate SKSpriteNode (I read up that using SKShapeNodes was not efficient).
However, there are about 1300 nodes in the scene, which is causing some frame rate issues, even on my iPhone X. It's currently idling at about 15-30 fps, which really isn't ideal.
Are there any ideas on how to either cache SKSpriteNodes to produce better performance? I'm probably overlooking many things, and not creating walls in the most efficient way but the frames seem way too low to be correct?
If anyone would be able to suggest or nudge me in the right location that would be a huge help.
I highly recommend using SKTextures for repeated, identical images. See Creating a Textured Sprite Node.
For optimal performance, create sprites before compile time and put them in a texture atlas in your asset catalog. For creating texture atlases, see the documentation for SKTextureAtlas.
I have a video of soccer in which the players are relatively far away from the camera and thus represent small portions of the image. I'm using background subtraction to detect the players and the results are fine but I have been asked to try detecting using Hog.
I tried using the detect MultiScale using the default descriptors presented on opencv but i cant get any detection. I dont really understand how can I make it work on this case, because on other sequences where the people are closer to the camera, the detector works fine.
Here is a sample image link
Thanks.
The descriptor you use with HOG determines the minimum size of person you can detect: with the DefaultPeopleDetector the detection window is 128 pixels high x 64 wide, so you can detect people around 90px high. With the Daimler descriptor the size you can detect is a bit smaller.
Your pedestrians are still too small for this, so you may need to magnify the whole image, or just the parts which show up as foreground using background segmentation.
Have a look at the function definition for detectMultiscale http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html#cascadeclassifier-detectmultiscale
It might be that you need to reduced the value of minsize so as to detect smaller people or the people might just be too far away.
In my application i should play video in unusual way.
Something like interactive player for special purposes.
Main issues here:
video resolution can be from 200*200px up to 1024*1024 px
i should have ability to change speed from -60 FPS to 60 PFS (in this case video should be played slower or faster depending on selected speed, negative means that video should play in back direction)
i should draw lines and objects over the video and scale it with image.
i should have ability Zoom image and pan it if its content more than screen size
i should have ability to change brightness, contrast and invert colors of this video
Now im doing next thing:
I splited my video to JPG frames
created timer for N times per seconds (play speed control)
each timer tick im drawing new texture (next JPG frame) with OpenGL
for zoom and pan im playing with OpenGL ES transformations (translate, scale)
All looks fine until i use 320*240 px, but if i use 512*512px my play rate is going down. Maybe timer behavour problem, maybe OpenGL. Sometimes, if im trying to open big textures with high play rate (more than 10-15 FPS), application just crash with memory warnings.
What is the best practice to solve this issue? What direction should i dig? Maybe cocos2d or other game engines helps me? Mb JPG is not best solution for textures and i should use PNG or PVR or smth else?
Keep the video data as a video and use AVAssetReader to get the raw frames. Use kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange as the colorspace, and do YUV->RGB colorspace conversion in GLES. It will mean keeping less data in memory, and make much of your image processing somewhat simpler (since you'll be working with luma and chroma data rather than RGB values).
You don't need to bother with Cocos 2d or any game engine for this. I strongly recommend doing a little bit of experimenting with OpenGL ES 2.0 and shaders. Using OpenGL for video is very simple and straightforward, adding a game engine to the mix is unnecessary overhead and abstraction.
When you upload image data to the textures, do not create a new texture every frame. Instead, create two textures: one for luma, and one for chroma data, and simply reuse those textures every frame. I suspect your memory issues are arising from using many images and new textures every frame and probably not deleting old textures.
JPEG frames will be incredibly expensive to uncompress. First step: use PNG.
But wait! There's more.
Cocos2D could help you mostly through its great support for sprite sheets.
The biggest help, however, may come from packed textures a la TexturePacker. Using PVR.CCZ compression can speed things up by insane amounts, enough for you to get better frame rates at bigger video sizes.
Vlad, the short answer is that you will likely never be able to get all of these features you have listed working at the same time. Playing video 1024 x 1024 video at 60 FPS is really going to be a stretch, I highly doubt that iOS hardware is going to be able to keep up with those kind of data transfer rates at 60FPS. Even the h.264 hardware on the device can only do 30FPS at 1080p. It might be possible, but to then layer graphics rendering over the video and also expect to be able to edit the brightness/contrast at the same time, it is just too many things at the same time.
You should focus in on what is actually possible instead of attempting to do every feature. If you want to see an example Xcode app that pushes iPad hardware right to the limits, please have a look at my Fireworks example project. This code displays multiple already decoded h.264 videos on screen at the same time. The implementation is built around CoreGraphics APIs, but the key thing is that Apple's impl of texture uploading to OpenGL is very fast because of a zero copy optimization. With this approach, a lot of video can be streamed to the device.
How can I reduce moire effects when downsampling halftone comic book images during live zoom on an iPhone or iPad?
I am writing a comic book viewer. It would be nice to provide higher resolution images and allow the user to zoom in while reading the comic book. However, my client is averse to moire effects and will not allow this feature if there are noticeable moire artifacts while zooming, which of course there are.
Modifying the images to be less susceptible to moire would only work if the modifications were not perceptible. Blur was specifically prohibited, as is anything that removes the beloved halftone dots.
The images are black and white halftone and line art. The originals are 600 dpi but what we ship with the application will be half that at best, so probably 2500 pixels or less tall.
So what are my options? If I write a custom downsampling algorithm would it be fast enough for real time on these devices? Are there other tricks I can do? Would it work to just avoid the size ratios that have the most visual moire effects?
As you zoom in an out, there are definitely peaks where the moire effects are worst. Is there a way to calculate what those points are and just zoom to a nearby scale that is not as bad?
Any suggestions are welcome. I have very little experience with image and signal processing, but am enjoying the opportunity to learn. I know nothing of wavelets and acutance and other jargon, so please be verbose.
Edit:
For now at least we are punting on dynamic zoom. We will support zooming in to full magnification but not arbitrary scaling. I hope to revisit this in the future.
Moire effects occur due to aliasing. Aliasing occurs due to the sampling frequency being too low compared to the frequency content of the signal/image.
I can't really see any way to avoid this without applying blur. If you choose your blur filter good enough you should be able to get results which do not look "blurred" at all.
Since blur is simple to implement I would implement downsampling with blurring and show it to the customer. If they are happy with the results, then all should be well.
The only other options I see is:
Custom downsampling method. Unless someone else has come up with one, I don't think it's an option, since you yourself claim to have little experience with image/signal processing.
Convert the comics to vector format which would yield infinite zoom.
Difficult problem, in general, interesting in particular. I doubt there is a good-simple solution - perhaps if we could assume a nearly ideal halftoning (monochrome images with halftoning dots placed in a perfect grid), but this would hardly work for a scanned image.
If you are interested in the math and/or want some bibliography to research, this thesis might be useful (i haven't read it)
Also you can search for descreening algorithms, plugins, etc, to get ideas.